Gene families associated with cancers

ABSTRACT

The invention relates generally to the changes in gene expression in human tissues from cancer patients. The invention relates specifically to human gene families which are differentially expressed in cancer tissues of breast, colon, esophagus, kidney, liver, lung, lymph node, ovary, pancreas, prostate, rectum, and/or stomach compared to corresponding normal tissues.

FIELD OF THE INVENTION

The present invention relates to the changes in gene expression in humantissues from cancer patients. The invention specifically relates tohuman genes which are differentially expressed in cancer tissues ofbreast, colon, esophagus, kidney, liver, lung, lymph node, ovary,pancreas, prostate, rectum, and/or stomach compared to correspondingnormal tissues.

BACKGROUND OF THE INVENTION

In the United States, more than one million new cancer cases arediagnosed and about half million people die of cancer. The causes ofcancer are many and varied, and include genetic predisposition,environmental influences, infectious agents and ageing. These transformnormal cells into cancerous ones by derailing a wide spectrum ofregulatory and downstream effector pathways. Several essentialalterations in cell physiology collectively dictate malignant growth:self-sufficiency in growth signals, insensitivity to growth-inhibitorysignals, evasion of programmed cell death, limitless replicativepotential, sustained angiogenesis, and tissue invasion and metastasis(Hanahan and Weinberg (2000), Cell 100:57-70).

To date, researchers have been able to identify many genetic alterationsbelieved to underlie tumor development. These genetic alterationsinclude amplification of oncogenes and mutations that result in the lossof tumor suppressor genes. Oncogenes were initially identified as genescarried by viruses that cause transformation of their target cells. Amajor class of the viral oncogenes have cellular counterparts that areinvolved in normal cell functions. The cellular genes are calledproto-oncogene, and in certain cases their mutation or aberrant in thecell is associated with tumor formation. The generation of a oncogenerepresents a gain-of-function in which a cellular proto-oncogene isinappropriately activated. This can involve a mutational change in theprotein, or constitutive activation, over-expression, or failure to turnoff expression at the appropriate time. About 100 oncogenes have beenidentified. Examples of oncogenes include, but are not limited to, ras,fos, myc, abl, and myb (Ponder (2001), Nature 411:336-341). Tumorsuppressor genes, in their wild-type alleles, express proteins thatsuppress abnormal cellular proliferation. When the gene coding for atumor suppressor protein is mutated or deleted, the resulting mutantprotein or the complete lack of tumor suppressor protein expression mayfail to correctly regulate cellular proliferation, and abnormalproliferation may take place, particularly if there is already existingdamage to the cellular regulatory mechanism. A number of well-studiedhuman tumors and tumor cell lines have missing or non-functional tumorsuppressor genes. Examples of tumor suppressor genes include, but arenot limited to, the retinoblastoma susceptibility gene or RB gene, thep53 gene, the deletion in colon carcinoma (DCC) gene and theneurofibromatosis type 1 (NF-1) tumor suppressor gene (Weinberg (1991),Science 254:1138-1146). Loss-of-function or inactivation of tumorsuppressor genes may play a central role in the initiation and/orprogression of a significant number of human cancers.

The utilization of genome-wide expression profiles to classify tumors,to identify drug targets, to identify diagnostic markers and/or to gainfurther insights into the consequences of chemotherapeutic treatmentscould facilitate the design of more efficacious stratagems for treatinga variety of cancers. Initial studies utilizing gene expression patternsto identify subtypes of cancer produced rather intriguing results (seePerou et al. (1999), Proc Natl Acad Sci USA 96:9212-9217; Golub et al.(1999), Science 286:531-537; Alizadeh et al. (2000), Nature 403:503-511;Alon et al. (1999), Proc Natl Acad Sci USA 96:6745-6750; and Bittner etal. (2000), Nature 406:536-540; Perou et al. (2000), Nature406:747-752). Molecular classification of B-cell lymphoma by geneexpression profiling elucidated clinically distinct diffuse large-B-celllymphoma subgroups (see Alizadeh et al., supra). In breast cancer,studies utilizing limited numbers of genes (8,102 genes) have classifiedtumors into subtypes based on gene expression profiles, and this studyindicated a diversity of molecular phenotypes associated with breasttumors (see Perou et al., supra). In addition, the expression profilinghas enabled researchers to map tissue-specific expression levels forthousands of genes (Alon et al. (1999), Proc Natl Acad Sci USA96:6745-6750; Iyer et al. (1999), Science 283:83-87; Khan et al. (1998),Cancer Res 58:5009-5013; Lee et al. (1999), Science 285:1390-1393; Wanget al. (1999), Gene 229:101-108; Whitney et al. (1999), Ann Neurol46:425-428). Although these studies have demonstrated that expressionprofiling may be used to produce improvements in diagnosis of humandiseases such as cancer, as well as in the development of improvedtherapeutic strategies, further studies are needed.

Although cancers are diverse and heterogeneous as they are derived fromnumerous tissues and multiple etiologic factors, it has been suggestedthat underlying this variability lies a relatively small number ofcritical events whose convergence is required for the development of anyand all cancers (Evan and Vousden (2001), Nature 411:342-348).Accordingly, there exists a need for the comprehensive investigation ofthe changes in global gene expression levels in many different types ofcancers to identify critical molecular markers associated with thedevelopment and progression of cancer. There remains a need in the artfor materials and methods that permit a more accurate diagnosis ofcancer. In addition, there remains a need in the art for methods totreat and methods to identify agents that can effectively treat thisdisease. The present invention meets these and other needs.

SUMMARY OF THE INVENTION

The present invention is based on new genes that are differentiallyexpressed in cancer tissues compared to normal tissues, hereinafterLFG1, LFG2, LFG3, LFG4, LFG5, LFG6, respectively. The invention includesisolated nucleic acid molecules comprising SEQ ID NO: 1, 3, 5, 7, 9, 11,13 or 15 or the complement thereof.

The present invention further includes the nucleic acid moleculesoperably linked to one or more expression control elements, includingvectors comprising the isolated nucleic acid molecules. The inventionfurther includes host cells transformed to contain the nucleic acidmolecules of the invention and methods for producing a proteincomprising the step of culturing a host cell transformed with a nucleicacid molecule of the invention under conditions in which the protein isexpressed.

The invention further provides an isolated polypeptide selected from thegroup consisting of an isolated polypeptide comprising the amino acidsequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16, an isolatedpolypeptide comprising a fragment of at least 10 amino acids of SEQ IDNO: 2, 4, 6, 8, 10, 12, 14 or 16, an isolated polypeptide comprisingconservative amino acid substitutions of SEQ ID NO: 2, 4, 6, 8, 10, 12,14 or 16 and an isolated polypeptide comprising naturally occurringamino acid sequence variants of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16.Polypeptides of the invention also include polypeptides with an aminoacid sequence having at least about 50%, 60%, 70% or 75% amino acidsequence identity with the sequence set forth in SEQ ID NO: 2, 4, 6, 8,10, 12, 14 or 16, preferably at least about 80%, more preferably atleast about 90-95%, and most preferably at least about 95-98% sequenceidentity with the sequence set forth in SEQ ID NO: 2, 4, 6, 8, 10, 12,14 or 16.

The present invention further provides methods of identifying othermembers of the polypeptide family of the invention. Specifically, thenucleic acid sequence of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15 can beused as a probe, or to generate PCR primers, in methods to identifynucleic acid molecules that encode other members of the LFG1, LFG2,LFG3, LFG4, LFG5 or LFG6 family of proteins.

The invention further provides an isolated antibody or antigen-bindingantibody fragment that specifically binds to a polypeptide of theinvention, including monoclonal and polyclonal antibodies.

The invention further provides methods of identifying an agent whichmodulates the expression of a nucleic acid molecule encoding a proteinof the invention, comprising: exposing cells which express the nucleicacid molecule to the agent; and determining whether the agent modulatesexpression of said nucleic acid molecule, thereby identifying an agentwhich modulates the expression of a nucleic acid molecule encoding theprotein.

The invention further provides methods of identifying an agent whichmodulates the level of or at least one activity of a protein of theinvention, comprising: exposing cells which express the protein to theagent; and determining whether the agent modulates the level of or atleast one activity of said protein, thereby identifying an agent whichmodulates the level of or at least one activity of the protein.

The present invention further provides methods of modulating theexpression of a nucleic acid molecule encoding a protein of theinvention, comprising the step of administering an effective amount ofan agent which modulates the expression of a nucleic acid moleculeencoding the protein. The invention also provides methods of modulatingat least one activity of a protein of the invention, comprising the stepof administering an effective amount of an agent which modulates atleast one activity of the protein of the invention.

The invention further provides methods of identifying binding partnersfor a protein of the invention, comprising the steps of exposing saidprotein to a potential binding partner; and determining if the potentialbinding partner binds to said protein, thereby identifying bindingpartners for the protein.

The present invention further provides methods to identify agents thatcan block or modulate the association of a protein of the invention witha binding partner. Specifically, an agent can be tested for the abilityto block, reduce or otherwise modulate the association of a protein ofinvention with a binding partner by contacting said protein, or afragment thereof, and a binding partner with a test agent anddetermining whether the test agent blocks or reduces the binding of theprotein of invention to the binding partner.

The present invention further provides methods for reducing or blockingthe association of a protein of invention with one or more of itsbinding partners, comprising the step of administrating an effectiveamount of an agent which reduces or blocks the binding of said proteinto the binding partner. The method can utilize an agent that binds tothe protein of invention or to the binding partner.

In accordance with another aspect of the invention, the proteins of theinvention can be used as starting points for rational drug design toprovide ligands, therapeutic drugs or other types of small chemicalmolecules. Alternatively, small molecules or other compounds identifiedby the above-described screening assays may serve as “lead compounds” inrational drug design.

The present invention further relates to a process for treating cancercomprising inserting into a cancerous cell a nucleic acid constructcomprising the nucleic acid molecules of the invention operably linkedto a promoter or enhancer element such that expression of said nucleicacid molecule causes suppression of said cancer.

The present invention further includes non-human transgenic animalsmodified to contain the nucleic acid molecules of the invention, ornon-human transgenic animals modified to contain the mutated nucleicacid molecules such that expression of the encoded polypeptides of theinvention is prevented.

The present invention also includes non-human transgenic animals inwhich all or a portion of a gene comprising all or a portion of SEQ IDNO: 1, 3, 5, 7, 9, 11, 13 or 15 has been knocked out or deleted from thegenome of the animal.

The invention further provides methods of diagnosing cancers, comprisingthe steps of acquiring a tissue, blood, urine or other sample from asubject and determining the level of expression of a nucleic acidmolecule of the invention or polypeptide of the invention.

The invention further includes compositions comprising a diluent and apolypeptide or protein selected from the group consisting of an isolatedpolypeptide comprising the amino acid sequence of SEQ ID NO: 2, 4, 6, 8,10, 12, 14 or 16, an isolated polypeptide comprising a fragment of atleast 10 amino acids of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16, anisolated polypeptide comprising conservative amino acid substitutions ofSEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16, naturally occurring amino acidsequence variants of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16 and anisolated polypeptide with an amino acid sequence having at least about50%, 60%, 70% or 75% amino acid sequence identity with the sequence setforth in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16, preferably at leastabout 80%, more preferably at least about 90-95%, and most preferably atleast about 95-98% sequence identity with the sequence set forth in SEQID NO: 2, 4, 6, 8, 10, 12, 14 or 16.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the relative alignment positions of the two LFG1 clones.

FIG. 2 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG1-Clone A (SEQ ID NO: 2). Analysis was performedaccording to the method of Kyte-Doolittle.

FIG. 3 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG1-Clone B (SEQ ID NO: 4). Analysis was performedaccording to the method of Kyte-Doolittle.

FIG. 4 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG2 (SEQ ID NO: 6). Analysis was performed accordingto the method of Kyte-Doolittle.

FIG. 5 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG3 (SEQ ID NO: 8). Analysis was performed accordingto the method of Kyte-Doolittle.

FIG. 6 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG4 (SEQ ID NO: 10). Analysis was performed accordingto the method of Kyte-Doolittle.

FIG. 7 is a hydrophobicity plot of the protein encoded by the openreading frame of ALFG5 (SEQ ID NO: 12). Analysis was performed accordingto the method of Kyte-Doolittle.

FIG. 8 shows the relative alignment positions of the two LFG6 clones.

FIG. 9 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG6-#20 (SEQ ID NO: 14). Analysis was performedaccording to the method of Kyte-Doolittle.

FIG. 10 is a hydrophobicity plot of the protein encoded by the openreading frame of LFG6-46 (SEQ ID NO: 16). Analysis was performedaccording to the method of Kyte-Doolittle.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

I. General Description

The present invention is based in part on the identification of new genefamilies that are differentially expressed in cancerous human tissuescompared to normal human tissues. These gene families correspond to thehuman cDNA of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13 and 15.

The genes and proteins of the invention may be used as diagnostic agentsor markers to detect cancer or to differentiate carcinoma from normaltissue in a sample. They can also serve as a target for agents thatmodulate gene expression or activity. For example, agents may beidentified that modulate biological processes associated with tumorgrowth, including the hyperplastic process of cancer.

II. Specific Embodiments

A. The Proteins Associated with Cancer

The present invention provides isolated proteins, allelic variants ofthe proteins, and conservative amino acid substitutions of the proteins.As used herein, the “protein” or “polypeptide” refers, in part, to aprotein that has the human amino acid sequence depicted in SEQ ID NO: 2,4, 6, 8, 10, 12, 14 or 16. The terms also refer to naturally occurringallelic variants and proteins that have a slightly different amino acidsequence than that specifically recited above. Allelic variants, thoughpossessing a slightly different amino acid sequence than those recitedabove, will still have the same or similar biological functionsassociated with these proteins.

As used herein, the family of proteins related to the human amino acidsequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16 refers to proteinsthat have been isolated from organisms in addition to humans. Themethods used to identify and isolate other members of the family ofproteins related to these proteins are described below.

The proteins of the present invention are preferably in isolated form.As used herein, a protein is said to be isolated when physical,mechanical or chemical methods are employed to remove the protein fromcellular constituents that are normally associated with the protein. Askilled artisan can readily employ standard purification methods toobtain an isolated protein.

The proteins of the present invention further include insertion,deletion or conservative amino acid substitution variants of SEQ ID NO:2, 4, 6, 8, 10, 12, 14 or 16. As used herein, a conservative variantrefers to alterations in the amino acid sequence that do not adverselyaffect the biological functions of the protein. A substitution,insertion or deletion is said to adversely affect the protein when thealtered sequence prevents or disrupts a biological function associatedwith the protein. For example, the overall charge, structure orhydrophobic/hydrophilic properties of the protein, in certain instances,may be altered without adversely affecting a biological activity.Accordingly, the amino acid sequence can be altered, for example torender the peptide more hydrophobic or hydrophilic, without adverselyaffecting the biological activities of the protein.

Ordinarily, the allelic variants, the conservative substitutionvariants, and the members of the protein family, will have an amino acidsequence having at least about 50%, 60%, 70% or 75% amino acid sequenceidentity with the sequence set forth in SEQ ID NO: 2, 4, 6, 8, 10, 12,14 or 16, more preferably at least about 80%, even more preferably atleast about 90-95%, and most preferably at least about 95-98% sequenceidentity. Identity or homology with respect to such sequences is definedherein as the percentage of amino acid residues in the candidatesequence that are identical with SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or16, after aligning the sequences and introducing gaps, if necessary, toachieve the maximum percent homology, and not considering anyconservative substitutions as part of the sequence identity (see sectionB for the relevant parameters). Fusion proteins, or N-terminal,C-terminal or internal extensions, deletions, or insertions into thepeptide sequence shall not be construed as affecting homology.

Thus, the proteins of the present invention include molecules having theamino acid sequence disclosed in SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or16; fragments thereof having a consecutive sequence of at least about 3,4, 5, 6, 10, 15, 20, 25, 30, 35 or more amino acid residues of theseproteins; amino acid sequence variants wherein one or more amino acidresidues has been inserted N- or C-terminal to, or within, the disclosedcoding sequence; and amino acid sequence variants of the disclosedsequence, or their fragments as defined above, that have beensubstituted by at least one residue. Such fragments, also referred to aspeptides or polypeptides, may contain antigenic regions, functionalregions of the protein identified as regions of the amino acid sequencewhich correspond to known protein domains, as well as regions ofpronounced hydrophilicity. The regions are all easily identifiable byusing commonly available protein sequence analysis software such asMacVector (Oxford Molecular).

Contemplated variants further include those containing predeterminedmutations by, e.g., homologous recombination, site-directed or PCRmutagenesis, and the corresponding proteins of other animal species,including but not limited to rabbit, mouse, rat, porcine, bovine, ovine,equine and non-human primate species, and the alleles or other naturallyoccurring variants of the family of proteins; and derivatives whereinthe protein has been covalently modified by substitution, chemical,enzymatic, or other appropriate means with a moiety other than anaturally occurring amino acid (for example a detectable moiety such asan enzyme or radioisotope).

The present invention further provides compositions comprising a proteinor polypeptide of the invention and a diluent. Suitable diluents can beaqueous or non-aqueous solvents or a combination thereof, and cancomprise additional components, for example water-soluble salts orglycerol, that contribute to the stability, solubility, activity, and/orstorage of the protein or polypeptide.

As described below, members of the families of proteins can be used: (1)to identify agents which modulate the level of or at least one activityof the protein, (2) to identify binding partners for the protein, (3) asan antigen to raise polyclonal or monoclonal antibodies, (4) as atherapeutic agent or target and (5) as a diagnostic agent or marker ofcancer.

B. Nucleic Acid Molecules

The present invention further provides nucleic acid molecules thatencode the protein having SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16 andthe related proteins herein described, preferably in isolated form. Asused herein, “nucleic acid” is defined as RNA or DNA that encodes aprotein or peptide as defined above, is complementary to a nucleic acidsequence encoding such peptides, hybridizes to the nucleic acid of SEQID NO: 1, 3, 5, 7, 9, 11, 13 or 15 and remains stably bound to it underappropriate stringency conditions, encodes a polypeptide sharing atleast about 50%, 60%, 70% or 75%, preferably at least about 80%, morepreferably at least about 90-95%, and most preferably at least about95-98% or more identity with the peptide sequence of SEQ ID NO: 2, 4, 6,8, 10, 12, 14 or 16 or exhibits at least 50%, 60%, 70% or 75%,preferably at least about 80%, more preferably at least about 90-95%,and most preferably at least about 95-98% or more nucleotide sequenceidentity over the open reading frames of SEQ ID NO: 1, 3, 5, 7, 9, 11,13 or 15.

The present invention further includes isolated nucleic acid moleculesthat specifically hybridize to the complement of SEQ ID NO: 1, 3, 5, 7,9, 11, 13 or 15, particularly molecules that specifically hybridize overthe open reading frames. Such molecules that specifically hybridize tothe complement of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15 typically do sounder stringent hybridization conditions.

Specifically contemplated are genomic DNA, cDNA, mRNA and antisensemolecules, as well as nucleic acids based on alternative backbones orincluding alternative bases, whether derived from natural sources orsynthesized. Such hybridizing or complementary nucleic acids, however,are defined further as being novel and unobvious over any prior artnucleic acid including that which encodes, hybridizes under appropriatestringency conditions, or is complementary to nucleic acid encoding aprotein according to the present invention.

Homology or identity at the nucleotide or amino acid sequence level isdetermined by BLAST (Basic Local Alignment Search Tool) analysis usingthe algorithm employed by the programs blastp, blastn, blastx, tblastnand tblastx (Altschul et al. (1997), Nucleic Acids Res. 25: 3389-3402,and Karlin et al. (1990), Proc. Natl. Acad. Sci. USA 87: 2264-2268, bothfully incorporated by reference) which are tailored for sequencesimilarity searching. The approach used by the BLAST program is to firstconsider similar segments, with and without gaps, between a querysequence and a database sequence, then to evaluate the statisticalsignificance of all matches that are identified and finally to summarizeonly those matches which satisfy a preselected threshold ofsignificance. For a discussion of basic issues in similarity searchingof sequence databases, see Altschul et al. (1994), Nat. Genet. 6:119-129 which is fully incorporated by reference. The search parametersfor histogram, descriptions, alignments, expect (i.e., the statisticalsignificance threshold for reporting matches against databasesequences), cutoff, matrix and filter (low complexity) are at thedefault settings. The default scoring matrix used by blastp, blastx,tblastn, and tblastx is the BLOSUM62 matrix (Henikoff et al. (1992),Proc. Natl. Acad. Sci. USA 89: 10915-10919, fully incorporated byreference), recommended for query sequences over 85 nucleotides or aminoacids in length.

For blastn, the scoring matrix is set by the ratios of M (i.e., thereward score for a pair of matching residues) to N (i.e., the penaltyscore for mismatching residues), wherein the default values for M and Nare 5 and -4, respectively. Four blastn parameters were adjusted asfollows: Q=10 (gap creation penalty); R=10 (gap extension penalty);wink=1 (generates word hits at every wink^(th) position along thequery); and gapw=16 (sets the window width within which gappedalignments are generated). The equivalent Blastp parameter settings wereQ=9; R=2; wink=1; and gapw=32. A Bestfit comparison between sequences,available in the GCG package version 10.0, uses DNA parameters GAP-50(gap creation penalty) and LEN=3 (gap extension penalty) and theequivalent settings in protein comparisons are GAP=8 and LEN=2.

“Stringent conditions” are those that (1) employ low ionic strength andhigh temperature for washing, for example, 0.015 M NaCl/0.0015 M sodiumcitrate/0.1% SDS at 50° C., or (2) employ during hybridization adenaturing agent such as formamide, for example, 50% (vol/vol) formamidewith 0.1% bovine serum albumin/0.1% Ficoll/0.1% polyvinylpyrrolidone/50mM sodium phosphate buffer at pH 6.5 with 750 mM NaCl, 75 mM sodiumcitrate at 42° C. Another example is hybridization in 50% formamide,5×SSC (0.75 M NaCl, 0.075 M sodium citrate), 50 mM sodium phosphate (pH6.8), 0.1% sodium pyrophosphate, 5× Denhardt's solution, sonicatedsalmon sperm DNA (50 μg/ml), 0.1% SDS, and 10% dextran sulfate at 42°C., with washes at 42° C. in 0.2×SSC and 0.1% SDS. A skilled artisan canreadily determine and vary the stringency conditions appropriately toobtain a clear and detectable hybridization signal. Preferred moleculesare those that hybridize under the above conditions to the complement ofSEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15 and which encode a functional orfull-length protein. Even more preferred hybridizing molecules are thosethat hybridize under the above conditions to the complement strand ofthe open reading frame of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15.

As used herein, a nucleic acid molecule is said to be “isolated” whenthe nucleic acid molecule is substantially separated from contaminantnucleic acid molecules encoding other polypeptides.

The present invention further provides fragments of the disclosednucleic acid molecules. As used herein, a fragment of a nucleic acidmolecule refers to a small portion of the coding or non-coding sequence.The size of the fragment will be determined by the intended use. Forexample, if the fragment is chosen so as to encode an active portion ofthe protein, the fragment will need to be large enough to encode thefunctional region(s) of the protein. For instance, fragments whichencode peptides corresponding to predicted antigenic regions may beprepared. If the fragment is to be used as a nucleic acid probe or PCRprimer, then the fragment length is chosen so as to obtain a relativelysmall number of false positives during probing/priming (see thediscussion in Section G).

Fragments of the nucleic acid molecules of the present invention (i.e.,synthetic oligonucleotides) that are used as probes or specific primersfor the polymerase chain reaction (PCR), or to synthesize gene sequencesencoding proteins of the invention, can easily be synthesized bychemical techniques, for example, the phosphoramidite method ofMatteucci et al., ((1981) J. Am. Chem. Soc. 103: 3185-3191) or usingautomated synthesis methods. In addition, larger DNA segments canreadily be prepared by well known methods, such as synthesis of a groupof oligonucleotides that define various modular segments of the gene,followed by ligation of oligonucleotides to build the complete modifiedgene.

The nucleic acid molecules of the present invention may further bemodified so as to contain a detectable label for diagnostic and probepurposes. A variety of such labels are known in the art and can readilybe employed with the encoding molecules herein described. Suitablelabels include, but are not limited to, biotin, radiolabeled orfluorescently labeled nucleotides and the like. A skilled artisan canreadily employ any such label to obtain labeled variants of the nucleicacid molecules of the invention.

C. Isolation of Other Related Nucleic Acid Molecules

As described above, the identification and characterization of thenucleic acid molecule having SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15allows a skilled artisan to isolate nucleic acid molecules that encodeother members of the protein family in addition to the sequences hereindescribed. Further, the presently disclosed nucleic acid molecules allowa skilled artisan to isolate nucleic acid molecules that encode othermembers of the family of proteins in addition to the proteins having SEQID NO: 2, 4, 6, 8, 10, 12, 14 or 16.

For instance, a skilled artisan can readily use the amino acid sequenceof SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16 to generate antibody probesto screen expression libraries prepared from appropriate cells.Typically, polyclonal antiserum from mammals such as rabbits immunizedwith the purified protein (as described below) or monoclonal antibodiescan be used to probe a mammalian cDNA or genomic expression library,such as lambda gtll library, to obtain the appropriate coding sequencefor other members of the protein family. The cloned cDNA sequence can beexpressed as a fusion protein, expressed directly using its own controlsequences, or expressed by constructions using control sequencesappropriate to the particular host used for expression of the enzyme.

Alternatively, a portion of the coding sequence herein described can besynthesized and used as a probe to retrieve DNA encoding a member of theprotein family from any mammalian organism. Oligomers containingapproximately 18-20 nucleotides (encoding about a 6-7 amino acidstretch) are prepared and used to screen genomic DNA or cDNA librariesto obtain hybridization under stringent conditions or conditions ofsufficient stringency to eliminate an undue level of false positives.

Additionally, pairs of oligonucleotide primers can be prepared for usein PCR to selectively clone an encoding nucleic acid molecule. A PCRdenature/anneal/extend cycle for using such PCR primers is well known inthe art and can readily be adapted for use in isolating other encodingnucleic acid molecules.

Nucleic acid molecules encoding other members of the protein family mayalso be identified in existing genomic or other sequence informationusing any available computational method, including but not limited to:PSI-BLAST (Altschul et al. (1997), Nucl. Acids Res. 25: 3389-3402);PHI-BLAST (Zhang et al. (1998), Nucl. Acids Res. 26: 3986-3990), 3D-PSSM(Kelly et al. (2000), J. Mot. Biol. 299: 499-520); and othercomputational analysis methods (Shi et al. (1999), Biochem. Biophys.Res. Commun. 262: 132-138 and Matsunami et. al. (2000), Nature 404:601-604).

D. rDNA Molecules Containing a Nucleic Acid Molecule

The present invention further provides recombinant DNA molecules (rDNAs)that contain a coding sequence. As used herein, a rDNA molecule is a DNAmolecule that has been subjected to molecular manipulation in situ.Methods for generating rDNA molecules are well known in the art, forexample, see Sambrook et al., Molecular Cloning—A Laboratory Manual.Third Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y., 2001. In the preferred rDNA molecules, a coding DNA sequence isoperably linked to expression control sequences and/or vector sequences.

The choice of vector and/or expression control sequences to which one ofthe protein family encoding sequences of the present invention isoperably linked depends directly, as is well known in the art, on thefunctional properties desired, e.g., protein expression, and the hostcell to be transformed. A vector contemplated by the present inventionis at least capable of directing the replication or insertion into thehost chromosome, and preferably also expression, of the structural geneincluded in the rDNA molecule.

Expression control elements that are used for regulating the expressionof an operably linked protein encoding sequence are known in the art andinclude, but are not limited to, inducible promoters, constitutivepromoters, secretion signals, and other regulatory elements. Preferably,the inducible promoter is readily controlled, such as being responsiveto a nutrient in the host cell's medium.

In one embodiment, the vector containing a coding nucleic acid moleculewill include a prokaryotic replicon, i.e., a DNA sequence having theability to direct autonomous replication and maintenance of therecombinant DNA molecule extrachromosomally in a prokaryotic host cell,such as a bacterial host cell, transformed therewith. Such replicons arewell known in the art. In addition, vectors that include a prokaryoticreplicon may also include a gene whose expression confers a detectablemarker such as a drug resistance. Typical bacterial drug resistancegenes are those that confer resistance to ampicillin, kanamycin,chloramphenicol or tetracycline.

Vectors that include a prokaryotic replicon can further include aprokaryotic or bacteriophage promoter capable of directing theexpression (transcription and translation) of the coding gene sequencesin a bacterial host cell, such as E. coli. A promoter is an expressioncontrol element formed by a DNA sequence that permits binding of RNApolymerase and transcription to occur. Promoter sequences compatiblewith bacterial hosts are typically provided in plasmid vectorscontaining convenient restriction sites for insertion of a DNA segmentof the present invention. Typical of such vector plasmids are pUC8,pUC9, pBR322 and pBR329 available from BioRad Laboratories, (Richmond,Calif.), pPL and pKK223 available from Pharmacia (Piscataway, N.J.).

Expression vectors compatible with eukaryotic cells, preferably thosecompatible with vertebrate cells, can also be used to form rDNAmolecules that contain a coding sequence. Eukaryotic cell expressionvectors, including viral vectors, are well known in the art and areavailable from several commercial sources. Typically, such vectors areprovided containing convenient restriction sites for insertion of thedesired DNA segment. Typical of such vectors are pSVL and pKSV-10(Pharmacia), pBPV-1/pML2d (International Biotechnologies, Inc.), pTDT1(ATCC, #31255), the vector pCDM8 described herein, and the likeeukaryotic expression vectors. Vectors may be modified to include tissuespecific promoters if needed.

Eukaryotic cell expression vectors used to construct the rDNA moleculesof the present invention may further include a selectable marker that iseffective in an eukaryotic cell, preferably a drug resistance selectionmarker. A preferred drug resistance marker is the gene whose expressionresults in neomycin resistance, i.e., the neomycin phosphotransferase(neo) gene. (Southern et al. (1982), J. Mol. Anal. Genet. 1:327-341).Alternatively, the selectable marker can be present on a separateplasmid, and the two vectors are introduced by co-transfection of thehost cell, and selected by culturing in the appropriate drug for theselectable marker.

E. Host Cells Containing an Exogenously Supplied Coding Nucleic AcidMolecule

The present invention further provides host cells transformed with anucleic acid molecule that encodes a protein of the present invention.The host cell can be either prokaryotic or eukaryotic. Eukaryotic cellsuseful for expression of a protein of the invention are not limited, solong as the cell line is compatible with cell culture methods andcompatible with the propagation of the expression vector and expressionof the gene product. Preferred eukaryotic host cells include, but arenot limited to, yeast, insect and mammalian cells, preferably vertebratecells such as those from a mouse, rat, monkey or human cell line.Preferred eukaryotic host cells include Chinese hamster ovary (CHO)cells available from the ATCC as CCL61, NIH Swiss mouse embryo cells(NIH/3T3) available from the ATCC as CRL 1658, baby hamster kidney cells(BHK), and the like eukaryotic tissue culture cell lines.

Any prokaryotic host can be used to express a rDNA molecule encoding aprotein of the invention. The preferred prokaryotic host is E. coli.

Transformation of appropriate cell hosts with a rDNA molecule of thepresent invention is accomplished by well known methods that typicallydepend on the type of vector used and host system employed. With regardto transformation of prokaryotic host cells, electroporation and salttreatment methods are typically employed (see, for example, Cohen et at.(1972), Proc. Natl. Acad. Sci. USA 69: 2110; and Sambrook et al.,supra). With regard to transformation of vertebrate cells with vectorscontaining rDNAs, electroporation, cationic lipid or salt treatmentmethods are typically employed, see, for example, Graham et al (1973),Virol. 52: 456; Wigler et al. (1979), Proc. Natl. Acad. Sci. USA76:1373-1376.

Successfully transformed cells, i.e., cells that contain a rDNA moleculeof the present invention, can be identified by well known techniquesincluding the selection for a selectable marker. For example, cellsresulting from the introduction of an rDNA of the present invention canbe cloned to produce single colonies. Cells from those colonies can beharvested, lysed and their DNA content examined for the presence of therDNA using a method such as that described by Southern, (1975) J. Mol.Biol. 98: 503 or Berent et al., (1985) Biotech. 3: 208, or the proteinsproduced from the cell assayed via an immunological method.

F. Production of Recombinant Proteins Using a rDNA Molecule

The present invention further provides methods for producing a proteinof the invention using nucleic acid molecules herein described. Ingeneral terms, the production of a recombinant form of a proteintypically involves the following steps:

First, a nucleic acid molecule is obtained that encodes a protein of theinvention, such as a nucleic acid molecule comprising, consistingessentially of or consisting of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15,or nucleotides 390-4883 or 390-4880 of SEQ ID NO: 1, or nucleotides124907 or 124904 of SEQ ID NO: 3, or nucleotides 424-1911 or 424-1908 ofSEQ ID NO: 5, or nucleotides 405-1838 or 405-1835 of SEQ ID NO: 7, ornucleotides 89-1153 or 89-1150 of SEQ ID NO: 9, or nucleotides 223-1572or 223-1569 of SEQ ID NO: 11, or 418-1395 or 418-1392 of SEQ ID NO: 13,or nucleotides 271-1434 or 271-1431 of SEQ ID NO: 15. If the encodingsequence is uninterrupted by introns, as are these open-reading-frames,it is directly suitable for expression in any host.

The nucleic acid molecule is then preferably placed in operable linkagewith suitable control sequences, as described above, to form anexpression unit containing the protein open reading frame. Theexpression unit is used to transform a suitable host and the transformedhost is cultured under conditions that allow the production of therecombinant protein. Optionally the recombinant protein is isolated fromthe medium or from the cells; recovery and purification of the proteinmay not be necessary in some instances where some impurities may betolerated.

Each of the foregoing steps can be done in a variety of ways. Forexample, the desired coding sequences may be obtained from genomicfragments and used directly in appropriate hosts. The construction ofexpression vectors that are operable in a variety of hosts isaccomplished using appropriate replicons and control sequences, as setforth above. The control sequences, expression vectors, andtransformation methods are dependent on the type of host cell used toexpress the gene and were discussed in detail earlier. Suitablerestriction sites can, if not normally available, be added to the endsof the coding sequence so as to provide an excisable gene to insert intothese vectors. A skilled artisan can readily adapt any host/expressionsystem known in the art for use with the nucleic acid molecules of theinvention to produce recombinant protein.

G. Methods to Identify Agents that Modulate the Expression of a NucleicAcid Encoding the Genes Associated with Cancer

Another embodiment of the present invention provides methods foridentifying agents that modulate the expression of a nucleic acidencoding a protein of the invention such as a protein having the aminoacid sequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16. Such assaysmay utilize any available means of monitoring for changes in theexpression level of the nucleic acids of the invention. As used herein,an agent is said to modulate the expression of a nucleic acid of theinvention if it is capable of up- or down-regulating expression of thenucleic acid in a cell.

In one assay format, cell lines that contain reporter gene fusionsbetween nucleotides from within the open reading frame defined bynucleotides 390-4883 of SEQ ID NO: 1, nucleotides 12-4907 of SEQ ID NO:3, nucleotides 424-1911 of SEQ ID NO: 5, nucleotides 405-1838 of SEQ IDNO: 7, nucleotides 89-1153 of SEQ ID NO: 9, nucleotides 223-1572 of SEQID NO: 11, nucleotides 418-1395 of SEQ ID NO: 13, nucleotides 271-1434of SEQ ID NO: 15, and/or the 5′ and/or 3′ regulatory elements and anyassayable fusion partner may be prepared. Numerous assayable fusionpartners are known and readily available including the fireflyluciferase gene and the gene encoding chloramphenicol acetyltransferase(Alam et al. (1990), Anal. Biochem. 188: 245-254). Cell lines containingthe reporter gene fusions are then exposed to the agent to be testedunder appropriate conditions and time. Differential expression of thereporter gene between samples exposed to the agent and control samplesidentifies agents which modulate the expression of a nucleic acid of theinvention.

Additional assay formats may be used to monitor the ability of the agentto modulate the expression of a nucleic acid encoding a protein of theinvention, such as the protein having SEQ ID NO: 2, 4, 6, 8, 10, 12, 14or 16. For instance, mRNA expression may be monitored directly byhybridization to the nucleic acids of the invention. Cell lines areexposed to the agent to be tested under appropriate conditions and timeand total RNA or mRNA is isolated by standard procedures such thosedisclosed in Sambrook et al., Molecular Cloning—A Laboratory Manual.Third Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor,N.Y., 2001.

The preferred cells will be those derived from human tissue, forinstance, biopsy tissue or cultured cells from patients with cancer.Cell lines such as ATCC breast ductal carcinoma cell lines (CatalogueNos. CRL-2320, CRL-2338, and CRL-7345), ATCC colorectal adenocarcinomacell lines (Catalogue Nos. CCL-222, CCL-224, CCL-225, CCL-234, CRL-7159,and CRL-7184), ATCC kidney clear cell carcinoma cell lines (CatalogueNos. HTB-46 and HTB-47), ATCC renal cell adenocarcinoma cell lines(Catalogue Nos. CRL-1611, CRL-1932 and CRL-1933), ATCC liverhepatocellular carcinoma cell lines (Catalogue Nos. CRL-2233, CRL-2234,and HB-8065), ATCC lung adenocarcinoma cell lines (Catalogue Nos.CRL-5944, CRL-7380, and CRL-5907), ATCC lymphoma cell lines (CatalogueNos. CRL-7936, CRL-7264, and CRL-7507), ATCC ovary adenocarcinoma celllines (Catalogue Nos. HTB-161, HTB-75, and HTB-76), ATCC pancreasadenocarcinoma cell lines (Catalogue Nos. CRL-1687, CRL-2119, andHTP-79), prostate adenocarinoma cell lines (Catalogue Nos. CRL-1435,CRL-2422, and CRL-2220), and ATCC gastric adenocarcinoma cell lines(Catalogue Nos. CRL-1739, CRL-1863, and CRL-1864) may be used.Alternatively, other available cells or cell lines may be used.

Probes to detect differences in RNA expression levels between cellsexposed to the agent and control cells may be prepared from the nucleicacids of the invention. It is preferable, but not necessary, to designprobes which hybridize only with target nucleic acids under conditionsof high stringency. Only highly complementary nucleic acid hybrids formunder conditions of high stringency. Accordingly, the stringency of theassay conditions determines the amount of complementarity which shouldexist between two nucleic acid strands in order to form a hybrid.Stringency should be chosen to maximize the difference in stabilitybetween the probe:target hybrid and probe:non-target hybrids.

Probes may be designed from the nucleic acids of the invention throughmethods known in the art. For instance, the G+C content of the probe andthe probe length can affect probe binding to its target sequence.Methods to optimize probe specificity are commonly available in Sambrooket al., supra, or Ausubel et al., Short Protocols in Molecular Biology,Fourth Ed., John Wiley & Sons, Inc., New York, 1999.

Hybridization conditions are modified using known methods, such as thosedescribed by Sambrook et al. and Ausubel et al. as required for eachprobe. Hybridization of total cellular RNA or RNA enriched for polyA RNAcan be accomplished in any available format. For instance, totalcellular RNA or RNA enriched for polyA RNA can be affixed to a solidsupport and the solid support exposed to at least one probe comprisingat least one, or part of one of the sequences of the invention underconditions in which the probe will specifically hybridize.Alternatively, nucleic acid fragments comprising at least one, or partof one of the sequences of the invention can be affixed to a solidsupport, such as a silicon chip, porous glass wafer or membrane. Thesolid support can then be exposed to total cellular RNA or polyA RNAfrom a sample under conditions in which the affixed sequences willspecifically hybridize. Such solid supports and hybridization methodsare widely available, for example, those disclosed by Beattie, (1995) WO95/11755. By examining for the ability of a given probe to specificallyhybridize to an RNA sample from an untreated cell population and from acell population exposed to the agent, agents which up- or down-regulatethe expression of a nucleic acid encoding the protein having thesequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16 are identified.

Hybridization for qualitative and quantitative analysis of mRNAs mayalso be carried out by using a RNase Protection Assay (i.e., RPA, see Maet al. (1996), Methods 10: 273-238). Briefly, an expression vehiclecomprising cDNA encoding the gene product and a phage specific DNAdependent RNA polymerase promoter (e.g., T7, T3 or SP6 RNA polymerase)is linearized at the 3′ end of the cDNA molecule, downstream from thephage promoter, wherein such a linearized molecule is subsequently usedas a template for synthesis of a labeled antisense transcript of thecDNA by in vitro transcription. The labeled transcript is thenhybridized to a mixture of isolated RNA (i.e., total or fractionatedmRNA) by incubation at 45° C. overnight in a buffer comprising 80%formamide, 40 mM Pipes, pH 6.4, 0.4 M NaCl and 1 mM EDTA. The resultinghybrids are then digested in a buffer comprising 40 μg/ml ribonuclease Aand 2 μg/ml ribonuclease. After deactivation and extraction ofextraneous proteins, the samples are loaded onto urea/polyacrylamidegels for analysis.

In another assay, to identify agents which affect the expression of theinstant gene products, cells or cell lines are first identified whichexpress the gene products of the invention physiologically. Cells and/orcell lines so identified would be expected to comprise the necessarycellular machinery such that the fidelity of modulation of thetranscriptional apparatus is maintained with regard to exogenous contactof agent with appropriate surface transduction mechanisms and/or thecytosolic cascades. Further, such cells or cell lines would betransduced or transfected with an expression vehicle (e.g., a plasmid orviral vector) construct comprising an operable non-translated 5′promoter-containing end of the structural gene encoding the instant geneproducts fused to one or more antigenic fragments, which are peculiar tothe instant gene products, wherein said fragments are under thetranscriptional control of said promoter and are expressed aspolypeptides whose molecular weight can be distinguished from thenaturally occurring polypeptides or may further comprise animmunologically distinct tag or other detectable marker. Such a processis well known in the art (see Sambrook et al., supra).

Cells or cell lines transduced or transfected as outlined above are thencontacted with agents under appropriate conditions. For example, theagent in a pharmaceutically acceptable excipient is contacted with cellsin an aqueous physiological buffer such as phosphate buffered saline(PBS) at physiological pH, Eagles balanced salt solution (BSS) atphysiological pH, PBS or BSS comprising serum or conditioned mediacomprising PBS or BSS and/or serum incubated at 37° C. Said conditionsmay be modulated as deemed necessary by one of skill in the art.Subsequent to contacting the cells with the agent, said cells will bedisrupted and the polypeptides of the lysate are fractionated such thata polypeptide fraction is pooled and contacted with an antibody to befurther processed by immunological assay (e.g., ELISA,immunoprecipitation or Western blot). The pool of proteins isolated fromthe “agent-contacted” sample will be compared with a control samplewhere only the excipient is contacted with the cells and an increase ordecrease in the immunologically generated signal from the“agent-contacted” sample compared to the control will be used todistinguish the effectiveness of the agent.

H. Methods to Identify Agents that Modulate the Level or at Least OneActivity of the Cancer Associated Proteins

Another embodiment of the present invention provides methods foridentifying agents that modulate the level or at least one activity of aprotein of the invention such as the protein having the amino acidsequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16. Such methods orassays may utilize any means of monitoring or detecting the desiredactivity and are particularly useful for identifying agents that treatcancer.

In one format, the relative amounts of a protein of the inventionbetween a cell population that has been exposed to the agent to betested compared to an un-exposed control cell population may be assayed.In this format, probes such as specific antibodies are used to monitorthe differential expression of the protein in the different cellpopulations. Cell lines or populations are exposed to the agent to betested under appropriate conditions and time. Cellular lysates may beprepared from the exposed cell line or population and a control,unexposed cell line or population. The cellular lysates are thenanalyzed with the probe.

Antibody probes are prepared by immunizing suitable mammalian hosts inappropriate immunization protocols using the peptides, polypeptides orproteins of the invention if they are of sufficient length, or, ifdesired, or if required to enhance immunogenicity, conjugated tosuitable carriers. Methods for preparing immunogenic conjugates withcarriers such as BSA, KLH, or other carrier proteins are well known inthe art. In some circumstances, direct conjugation using, for example,carbodiimide reagents may be effective; in other instances linkingreagents such as those supplied by Pierce Chemical Co. (Rockford, Ill.),may be desirable to provide accessibility to the hapten. The haptenpeptides can be extended at either the amino or carboxy terminus with acysteine residue or interspersed with cysteine residues, for example, tofacilitate linking to a carrier. Administration of the immunogens isconducted generally by injection over a suitable time period and withuse of suitable adjuvants, as is generally understood in the art. Duringthe immunization schedule, titers of antibodies are taken to determineadequacy of antibody formation.

While the polyclonal antisera produced in this way may be satisfactoryfor some applications, for pharmaceutical compositions, use ofmonoclonal preparations is preferred. Immortalized cell lines whichsecrete the desired monoclonal antibodies may be prepared using thestandard method of Kohler and Milstein ((1975) Nature 256: 495-497) ormodifications which effect immortalization of lymphocytes or spleencells, as is generally known. The immortalized cell lines secreting thedesired antibodies are screened by immunoassay in which the antigen isthe peptide hapten, polypeptide or protein. When the appropriateimmortalized cell culture secreting the desired antibody is identified,the cells can be cultured either in vitro or by production in ascitesfluid.

The desired monoclonal antibodies are then recovered from the culturesupernatant or from the ascites supernatant. Fragments of the monoclonalantibodies or the polyclonal antisera which contain the immunologicallysignificant (antigen-binding) portion can be used as antagonists, aswell as the intact antibodies. Use of immunologically reactive(antigen-binding) antibody fragments, such as the Fab, Fab′, or F(ab′)₂fragments is often preferable, especially in a therapeutic context, asthese fragments are generally less immunogenic than the wholeimmunoglobulin.

The antibodies or antigen-binding fragments may also be produced, usingcurrent technology, by recombinant means. Antibody regions that bindspecifically to the desired regions of the protein can also be producedin the context of chimeras with multiple species origin, such ashumanized antibodies.

Agents that are assayed in the above method can be randomly selected orrationally selected or designed. As used herein, an agent is said to berandomly selected when the agent is chosen randomly without consideringthe specific sequences involved in the association of a protein of theinvention alone or with its associated substrates, binding partners,etc. An example of randomly selected agents is the use a chemicallibrary or a peptide combinatorial library, or a growth broth of anorganism.

As used herein, an agent is said to be rationally selected or designedwhen the agent is chosen on a nonrandom basis which takes into accountthe sequence of the target site and/or its conformation in connectionwith the agent's action. Agents can be rationally selected or rationallydesigned by utilizing the peptide sequences that make up these sites.For example, a rationally selected peptide agent can be a peptide whoseamino acid sequence is identical to or a derivative of any functionalconsensus site.

The agents of the present invention can be, as examples, peptides, smallmolecules, vitamin derivatives, as well as carbohydrates. Dominantnegative proteins, DNAs encoding these proteins, antibodies to theseproteins, peptide fragments of these proteins or mimics of theseproteins may be introduced into cells to affect function. “Mimic” usedherein refers to the modification of a region or several regions of apeptide molecule to provide a structure chemically different from theparent peptide but topographically and functionally similar to theparent peptide (see Grant in: Molecular Biology and Biotechnology,Meyers, ed., pp. 659-664, VCH Publishers, Inc., New York, 1995). Askilled artisan can readily recognize that there is no limit as to thestructural nature of the agents of the present invention.

The peptide agents of the invention can be prepared using standard solidphase (or solution phase) peptide synthesis methods, as is known in theart. In addition, the DNA encoding these peptides may be synthesizedusing commercially available oligonucleotide synthesis instrumentationand produced recombinantly using standard recombinant productionsystems. The production using solid phase peptide synthesis isnecessitated if non-gene-encoded amino acids are to be included.

Another class of agents of the present invention are antibodiesimmunoreactive with critical positions of proteins of the invention,e.g., cytoplasmic domain, spacer domain, α-helical coiled-coil domain,or the receptor domain, as described herein. Antibody agents areobtained by immunization of suitable mammalian subjects with peptides,containing as antigenic regions, those portions of the protein intendedto be targeted by the antibodies.

I. Uses for Agents that Modulate the Expression or at Least one Activityof the Proteins Associated with Cancer

As provided in the Examples, the proteins and nucleic acids of theinvention, such as the proteins having the amino acid sequence of SEQ IDNO: 2, 4, 6, 8, 10, 12, 14 or 16, are differentially expressed incancerous tissue. Agents that up- or down-regulate or modulate theexpression of the protein or at least one activity of the protein, suchas agonists or antagonists, may be used to modulate biological andpathologic processes associated with the protein's function andactivity. This includes agents identified employing homologues andanalogues of the present invention.

As used herein, a subject can be any mammal, so long as the mammal is inneed of modulation of a pathological or biological process mediated by aprotein of the invention. The term “mammal” is defined as an individualbelonging to the class Mammalia. The invention is particularly useful inthe treatment of human subjects.

Pathological processes refer to a category of biological processes whichproduce a deleterious effect. For example, expression of a protein ofthe invention may be associated with cell growth or hyperplasia. As usedherein, an agent is said to modulate a pathological process when theagent reduces the degree or severity of the process. For instance,cancer may be prevented or disease progression modulated by theadministration of agents which up- or down-regulate or modulate in someway the expression or at least one activity of a protein of theinvention.

The agents of the present invention can be provided alone, or incombination with other agents that modulate a particular pathologicalprocess. For example, an agent of the present invention can beadministered in combination with other known drugs. As used herein, twoagents are said to be administered in combination when the two agentsare administered simultaneously or are administered independently in afashion such that the agents will act at the same time.

The agents of the present invention can be administered via parenteral,subcutaneous, intravenous, intramuscular, intraperitoneal, transdermal,or buccal routes. Alternatively, or concurrently, administration may beby the oral route. The dosage administered will be dependent upon theage, health, and weight of the recipient, kind of concurrent treatment,if any, frequency of treatment, and the nature of the effect desired.

The present invention further provides compositions containing one ormore agents which modulate expression or at least one activity of aprotein of the invention. While individual needs vary, determination ofoptimal ranges of effective amounts of each component is within theskill of the art. Typical dosages comprise 0.1 to 100 μg/kg body wt. Thepreferred dosages comprise 0.1 to 10 μg/kg body wt. The most preferreddosages comprise 0.1 to 1 μg/kg body wt.

In addition to the pharmacologically active agent, the compositions ofthe present invention may contain suitable pharmaceutically acceptablecarriers comprising excipients and auxiliaries which facilitateprocessing of the active compounds into preparations which can be usedpharmaceutically for delivery to the site of action. Suitableformulations for parenteral administration include aqueous solutions ofthe active compounds in water-soluble form, for example, water-solublesalts. In addition, suspensions of the active compounds as appropriateoily injection suspensions may be administered. Suitable lipophilicsolvents or vehicles include fatty oils, for example, sesame oil, orsynthetic fatty acid esters, for example, ethyl oleate or triglycerides.Aqueous injection suspensions may contain substances which increase theviscosity of the suspension include, for example, sodium carboxymethylcellulose, sorbitol, and/or dextran. Optionally, the suspension may alsocontain stabilizers. Liposomes can also be used to encapsulate the agentfor delivery into the cell.

The pharmaceutical formulation for systemic administration according tothe invention may be formulated for enteral, parenteral or topicaladministration. Indeed, all three types of formulations may be usedsimultaneously to achieve systemic administration of the activeingredient.

Suitable formulations for oral administration include hard or softgelatin capsules, pills, tablets, including coated tablets, elixirs,suspensions, syrups or inhalations and controlled release forms thereof.

In practicing the methods of this invention, the compounds of thisinvention may be used alone or in combination, or in combination withother therapeutic or diagnostic agents. In certain preferredembodiments, the compounds of this invention may be coadministered alongwith other compounds typically prescribed for these conditions accordingto generally accepted medical practice. The compounds of this inventioncan be utilized in vivo, ordinarily in mammals, such as humans, sheep,horses, cattle, pigs, dogs, cats, rats and mice, or in vitro.

J. Methods to Identify Binding Partners

Another embodiment of the present invention provides methods forisolating and identifying binding partners of proteins of the invention.In general, a protein of the invention is mixed with a potential bindingpartner or an extract or fraction of a cell under conditions that allowthe association of potential binding partners with the protein of theinvention. After mixing, peptides, polypeptides, proteins or othermolecules that have become associated with a protein of the inventionare separated from the mixture. The binding partner that bound to theprotein of the invention can then be removed and further analyzed. Toidentify and isolate a binding partner, the entire protein, for instancea protein comprising the entire amino acid sequence of SEQ ID NO: 2, 4,6, 8, 10, 12, 14 or 16 can be used. Alternatively, a fragment of theprotein can be used.

As used herein, a cellular extract refers to a preparation or fractionwhich is made from a lysed or disrupted cell. The preferred source ofcellular extracts will be cells derived from human tumors or transformedcells, for instance, biopsy tissue or tissue culture cells fromcarcinomas. Alternatively, cellular extracts may be prepared from normaltissue or available cell lines.

A variety of methods can be used to obtain an extract of a cell. Cellscan be disrupted using either physical or chemical disruption methods.Examples of physical disruption methods include, but are not limited to,sonication and mechanical shearing. Examples of chemical lysis methodsinclude, but are not limited to, detergent lysis and enzyme lysis. Askilled artisan can readily adapt methods for preparing cellularextracts in order to obtain extracts for use in the present methods.

Once an extract of a cell is prepared, the extract is mixed with theprotein of the invention under conditions in which association of theprotein with the binding partner can occur. A variety of conditions canbe used, the most preferred being conditions that closely resembleconditions found in the cytoplasm of a human cell. Features such asosmolarity, pH, temperature, and the concentration of cellular extractused, can be varied to optimize the association of the protein with thebinding partner.

After mixing under appropriate conditions, the bound complex isseparated from the mixture. A variety of techniques can be utilized toseparate the mixture. For example, antibodies specific to a protein ofthe invention can be used to immunoprecipitate the binding partnercomplex. Alternatively, standard chemical separation techniques such aschromatography and density/sediment centrifugation can be used.

After removal of non-associated cellular constituents found in theextract, the binding partner can be dissociated from the complex usingconventional methods. For example, dissociation can be accomplished byaltering the salt concentration or pH of the mixture.

To aid in separating associated binding partner pairs from the mixedextract, the protein of the invention can be immobilized on a solidsupport. For example, the protein can be attached to a nitrocellulosematrix or acrylic beads. Attachment of the protein to a solid supportaids in separating peptide/binding partner pairs from other constituentsfound in the extract. The identified binding partners can be either asingle protein or a complex made up of two or more proteins.Alternatively, binding partners may be identified using a Far-Westernassay according to the procedures of Takayama et al. (1997), MethodsMol. Biol. 69: 171-184 or Sauder et al. (1996), J. Gen. Virol. 77:991-996 or identified through the use of epitope tagged proteins or GSTfusion proteins.

Alternatively, the nucleic acid molecules of the invention can be usedin a yeast two-hybrid system or other in vivo protein-protein detectionsystem. The yeast two-hybrid system has been used to identify otherprotein partner pairs and can readily be adapted to employ the nucleicacid molecules herein described.

K. Use of the Binding Partners of the Cancer Associated Proteins

Once isolated, the binding partners of the proteins of the invention,and homologues and analogues thereof, obtained using the above describedmethods can be used for a variety of purposes. The binding partners canbe used to generate antibodies that bind to the binding partner usingtechniques known in the art. Antibodies that bind the binding partnercan be used to assay the activity of the protein of the invention, as atherapeutic agent to modulate a biological or pathological processmediated by the protein of the invention, or to purify the bindingpartner. These uses are described in detail below.

L. Methods to Identify Agents that Block the Associations between theBinding Partners and the Cancer Associated Proteins

Another embodiment of the present invention provides methods foridentifying agents that reduce or block the association of a protein ofthe invention with a binding partner. Specifically, a protein of theinvention is mixed with a binding partner in the presence and absence ofan agent to be tested. After mixing under conditions that allowassociation of the proteins, the two mixtures are analyzed and comparedto determine if the agent reduced or blocked the association of theprotein of the invention with the binding partner. Agents that block orreduce the association of the protein of the invention with the bindingpartner will be identified as decreasing the amount of associationpresent in the sample containing the tested agent.

As used herein, an agent is said to reduce or block the associationbetween a protein of the invention and a binding partner when thepresence of the agent decreases the extent to which or prevents thebinding partner from becoming associated with the protein of theinvention. One class of agents will reduce or block the association bybinding to the binding partner while another class of agents will reduceor block the association by binding to the protein of the invention.

The binding partner used in the above assay can either be an isolatedand fully characterized protein or can be a partially characterizedprotein that binds to the protein of the invention or a binding partnerthat has been identified as being present in a cellular extract. It willbe apparent to one of ordinary skill in the art that so long as thebinding partner has been characterized by an identifiable property,e.g., molecular weight, the present assay can be used.

Agents that are assayed in the above method can be randomly selected orrationally selected or designed. As used herein, an agent is said to berandomly selected when the agent is chosen randomly without consideringthe specific sequences involved in the association of the protein of theinvention with the binding partner. An example of randomly selectedagents is the use of a chemical library or a peptide combinatoriallibrary, or a growth broth of an organism.

As used herein, an agent is said to be rationally selected or designedwhen the agent is chosen on a nonrandom basis which takes into accountthe sequence of the target site and/or its conformation in connectionwith the agent's action. Agents can be rationally selected or rationallydesigned by utilizing the peptide sequences that make up the contactsites of the binding partner with the protein of the invention. Forexample, a rationally selected peptide agent can be a peptide whoseamino acid sequence is identical to the contact site of the protein ofthe invention on the binding partner. Such an agent will reduce or blockthe association of the protein of the invention with the binding partnerby binding to the binding partner.

The agents of the present invention can be, as examples, peptides, smallmolecules, vitamin derivatives, as well as carbohydrates. A skilledartisan can readily recognize that there is no limit as to thestructural nature of the agents of the present invention.

One class of agents of the present invention are peptide agents whoseamino acid sequences are chosen based on the amino acid sequence of theprotein of the invention. The peptide agents of the invention can beprepared using standard solid phase (or solution phase) peptidesynthesis methods, as is known in the art. In addition, the DNA encodingthese peptides may be synthesized using commercially availableoligonucleotide synthesis instrumentation and produced recombinantlyusing standard recombinant production systems. The production usingsolid phase peptide synthesis is necessitated if non-gene encoded aminoacids are to be included.

Another class of agents of the present invention are antibodiesimmunoreactive with critical positions of the protein of the inventionor the binding partner. As described above, antibodies are obtained byimmunization of suitable mammalian subjects with peptides, containing asantigenic regions, those portions of the protein of the invention or thebinding partner, intended to be targeted by the antibodies. Criticalregions include the contact sites involved in the association of theprotein of the invention with the binding partner.

As discussed below, the important minimal sequence of residues involvedin activity of the protein of the invention define a functional lineardomain that can be effectively used as a bait for two hybrid screeningand identification of potential associated molecules. Use of suchfragments will significantly increase the specificity of the screeningas opposed to using the full-length molecule and is therefore preferred.Similarly, this linear sequence can be also used as an affinity matrixalso to isolate binding proteins using a biochemical affinitypurification strategy.

M. Uses for Agents that Block the Associations between the BindingPartners and the Cancer Associated Proteins

As provided in the Examples, the proteins and nucleic acids of theinvention, such as the proteins having the amino acid sequence of SEQ IDNO: 2, 4, 6, 8, 10, 12, 14 or 16, are differentially expressed incancerous tissue. Agents that reduce or block the interactions of aprotein of the invention, including those identified employinghomologues and analogues of the protein, with a binding partner may beused to modulate biological and pathologic processes associated with theprotein's function and activity.

As used herein, a subject can be any mammal, so long as the mammal is inneed of modulation of a pathological or biological process mediated by aprotein of the invention. The term “mammal” is meant an individualbelonging to the class Mammalia. The invention is particularly useful inthe treatment of human subjects.

Pathological processes refer to a category of biological processes whichproduce a deleterious effect. For example, expression of a protein ofthe invention may be associated with cell growth or hyperplasia. As usedherein, an agent is said to modulate a pathological process when theagent reduces the degree or severity of the process. For instance,cancer may be prevented or disease progression modulated by theadministration of agents that reduce or block the interactions of aprotein of the invention with a binding partner.

The agents of the present invention can be administered via parenteral,subcutaneous, intravenous, intramuscular, intraperitoneal, transdermal,or buccal routes. Alternatively, or concurrently, administration may beby the oral route. The dosage administered will be dependent upon theage, health, and weight of the recipient, kind of concurrent treatment,if any, frequency of treatment, and the nature of the effect desired.

The present invention further provides compositions containing one ormore agents that block association of a protein of the invention with abinding partner. While individual needs vary, determination of optimalranges of effective amounts of each component is within the skill of theart. Typical dosages comprise 0.1 to 100 μg/kg body wt. The preferreddosages comprise 0.1 to 10 μg/kg body wt. The most preferred dosagescomprise 0.1 to 1 μg/kg body wt.

In addition to the pharmacologically active agent, the compositions ofthe present invention may contain suitable pharmaceutically acceptablecarriers comprising excipients and auxiliaries which facilitateprocessing of the active compounds into preparations which can be usedpharmaceutically for delivery to the site of action. Suitableformulations for parenteral administration include aqueous solutions ofthe active compounds in water soluble form, for example, water solublesalts. In addition, suspensions of the active compounds as appropriateoily injection suspensions may be administered. Suitable lipophilicsolvents or vehicles include fatty oils, for example, sesame oil, orsynthetic fatty acid esters, for example, ethyl oleate or triglycerides.Aqueous injection suspensions may contain substances which increase theviscosity of the suspension include, for example, sodium carboxymethylcellulose, sorbitol, and/or dextran. Optionally, the suspension may alsocontain stabilizers. Liposomes can also be used to encapsulate the agentfor delivery into the cell.

The pharmaceutical formulation for systemic administration according tothe invention may be formulated for enteral, parenteral or topicaladministration. Indeed, all three types of formulations may be usedsimultaneously to achieve systemic administration of the activeingredient.

Suitable formulations for oral administration include hard or softgelatin capsules, pills, tablets, including coated tablets, elixirs,suspensions, syrups or inhalations and controlled release forms thereof.

In practicing the methods of this invention, the compounds of thisinvention may be used alone or in combination, or in combination withother therapeutic or diagnostic agents. In certain preferredembodiments, the compounds of this invention may be coadministered alongwith other compounds typically prescribed for these conditions accordingto generally accepted medical practice. The compounds of this inventioncan be utilized in vivo, ordinarily in mammals, such as humans, sheep,horses, cattle, pigs, dogs, cats, rats and mice, or in vitro.

N. Rational Drug Design and Combinatorial Chemistry

The present invention further encompasses rational drug design andcombinatorial chemistry. Those of skill will recognize appropriatemethods to utilize and exploit aspects of the present invention inidentifying compounds which can be developed for cancer treatment.Rational drug design involving polypeptides requires identifying anddefining a first peptide with which the designed drug is to interact,and using the first target peptide to define the requirements for asecond peptide. With such requirements defined, one can find or preparean appropriate peptide or non-peptide that meets all or substantiallyall of the defined requirements. Thus, one goal of rational drug designis to produce structural or functional analogs of biologically activepolypeptides of interest or of small molecules with which they interact(e.g., agonists, antagonists, null compounds) in order to fashion drugsthat are, for example, more or less potent forms of the ligand. (See,e.g., Hodgson (1991), Bio. Technology 9:19-21). Combinatorial chemistryis the science of synthesizing and testing compounds for bioactivity enmasse, instead of one by one, the aim being to discover drugs andmaterials more quickly and inexpensively than was formerly possible.Rational drug design and combinatorial chemistry have become moreintimately related in recent years due to the development of approachesin computer-aided protein modeling and drug discovery. (See e.g., U.S.Pat. Nos. 4,908,773; 5,884,230; 5,873,052; 5,331,573; and 5,888,738).

The use of molecular modeling as a tool for rational drug design andcombinatorial chemistry has dramatically increased due to the advent ofcomputer graphics. Not only is it possible to view molecules on computerscreens in three dimensions but it is also possible to examine theinteractions of macromolecules such as enzymes and receptors andrationally designed derivative molecules to test. (See Boorman (1992),Chem. Eng. News 70:18-26). A vast amount of user-friendly software andhardware is now available and virtually all pharmaceutical companieshave computer modeling groups devoted to rational drug design. MolecularSimulations Inc. (www.msi.com), for example, sells several sophisticatedprograms that allow a user to start from an amino acid sequence, build atwo or three-dimensional model of the protein or polypeptide, compare itto other two and three-dimensional models, and analyze the interactionsof compounds, drugs, and peptides with a three dimensional model in realtime. Accordingly, in some embodiments of the invention, software isused to compare regions of the invention protein and molecules thatinteract therewith (collectively referred to as “binding partners”—e.g.,anti-protein antibodies), and fragments or derivatives of thesemolecules with other molecules, such as peptides, peptidomimetics, andchemicals, so that therapeutic interactions can be predicted anddesigned. (See Schneider (1998), Genetic Engineering News December: page20; Tempczyk et al. (1997), Molecular Simulations Inc. Solutions April;and Butenhof (1998), Molecular Simulations Inc. Case Notes (August 1998)for a discussion of molecular modeling).

O. Gene Therapy

In another embodiment, genetic therapy can be used as a means formodulating biological and pathologic processes associated with theprotein's function and activity. This comprises inserting into acancerous cell a gene construct encoding a protein comprising all or atleast a portion of the sequences of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or16, or alternatively a gene construct comprising all or a portion of thenon-coding region of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15, operablylinked to a promoter or enhancer element such that expression of saidprotein causes suppression of said cancer and wherein said promoter orenhancer element is a promoter or enhancer element modulating said geneconstruct.

In the constructs described, expression of said protein can be directedfrom any suitable promoter (e.g., the human cytomegalovirus (CMV),simian virus 40 (SV40), or metallothionein promoters), and regulated byany appropriate mammalian regulatory element. For example, if desired,enhancers known to preferentially direct gene expression in neuralcells, T cells, or B cells may be used to direct the expression. Theenhancers used could include, without limitation, those that arecharacterized as tissue or cell specific in their expression.Alternatively, if a genomic clone of LFG1, LFG2, LFG3, LFG4, LFG5 orLFG6 is used as a therapeutic construct (for example, following itsisolation by hybridization with the nucleic acid molecule of theinvention described above), regulation may be mediated by the cognateregulatory sequences or, if desired, by regulatory sequences derivedfrom a heterologous source, including any of the promoters or regulatoryelements described above.

Insertion of the construct into a cancerous cell is accomplished invivo, for example using a viral or plasmid vector. Such methods can alsobe applied to in vitro uses. Thus, the methods of the present inventionare readily applicable to different forms of gene therapy, either wherecells are genetically modified ex vivo and then administered to a hostor where the gene modification is conducted in vivo using any of anumber of suitable methods involving vectors especially suitable to suchtherapies.

Retroviral vectors, adenoviral vectors, adeno-associated viral vectors,or other viral vectors with the appropriate tropism for cells likely tobe involved in cancer (for example, epithelial cells) may be used as agene transfer delivery system for a therapeutic gene construct. Numerousvectors useful for this purpose are generally known (Cozzi P J, et al.,(2002) Prostate, 53(2):95-100; Bitzer M, Lauer U., (2002) Dtsch MedWochenschr. 127(31-32):1623-1624; Mezzina and Danos (2002), TrendsGenet. 8:241-256; Loser et al. (2002) Curr. Gene Ther. 2:161-171;Pfeifer and Verma (2001), Annu. Rev. Genomics Hum. Genet. 2:177-211).Retroviral vectors are particularly well developed and have been used inclinical settings (Anderson et al. (1995), U.S. Pat. No. 5,399,346).Non-viral approaches may also be employed for the introduction oftherapeutic DNA into cells otherwise predicted to undergo cancer(Jeschke et al. (20002) Curr. Gene Ther. 1:267-278; Wu et al. (1988), J.Biol. Chem. 263:14621-14624; Wu et al. (1989), J. Biol. Chem.264:16985-16987). For example, a gene may be introduced into a neuron ora T cell by lipofection, asialorosonucoid polylysine conjugation, or,less preferably, microinjection under surgical conditions.

For any of the methods of application described above, the therapeuticnucleic acid construct is preferably applied to the site of the cancerevent (for example, by injection). However, it may also be applied totissue in the vicinity of the cancer event or to a blood vesselsupplying the cells predicted to undergo cancer.

P. Transgenic Animals

Transgenic animals containing mutant, knock-out or modified genescorresponding to the cDNA sequence of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13or 15, or the open reading frame encoding the polypeptide sequence ofSEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16, or fragments thereof having aconsecutive sequence of at least about 3, 4, 5, 6, 10, 15, 20, 25, 30,35 or more amino acid residues, are also included in the invention.Transgenic animals are genetically modified animals into whichrecombinant, exogenous or cloned genetic material has beenexperimentally transferred. Such genetic material is often referred toas a “transgene.” The nucleic acid sequence of the transgene, in thiscase a form of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15, may be integratedeither at a locus of a genome where that particular nucleic acidsequence is not otherwise normally found or at the normal locus for thetransgene. The transgene may consist of nucleic acid sequences derivedfrom the genome of the same species or of a different species than thespecies of the target animal.

In some embodiments, transgenic animals in which all or a portion of agene comprising SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15 is deleted may beconstructed. In those cases where the gene corresponding to SEQ ID NO:1, 3, 5, 7, 9, 11, 13 or 15 contains one or more introns, the entiregene—all exons, introns and the regulatory sequences—may be deleted.Alternatively, less than the entire gene may be deleted. For example, asingle exon and/or intron may be deleted, so as to create an animalexpressing a modified version of a protein of the invention.

The term “germ cell line transgenic animal” refers to a transgenicanimal in which the genetic alteration or genetic information wasintroduced into a germ line cell, thereby conferring the ability of thetransgenic animal to transfer the genetic information to offspring. Ifsuch offspring in fact possess some or all of that alteration or geneticinformation, then they too are transgenic animals.

The alteration or genetic information may be foreign to the species ofanimal to which the recipient belongs, foreign only to the particularindividual recipient, or may be genetic information already possessed bythe recipient. In the last case, the altered or introduced gene may beexpressed differently than the native gene.

Transgenic animals can be produced by a variety of different methodsincluding transfection, electroporation, microinjection, gene targetingin embryonic stem cells and recombinant viral and retroviral infection(see, e.g., U.S. Pat. No. 4,736,866; U.S. Pat. No. 5,602,307; Mullins etal. (1993), Hypertension 22: 630-633; Brenin et al. (1997), Surg. Oncol.6: 99-110; Recombinant Gene Expression Protocols (Methods in MolecularBiology. Vol. 62). Tuan, ed., Humana Press, Totowa, N.J., 1997).

A number of recombinant or transgenic mice have been produced, includingthose which express an activated oncogene sequence (U.S. Pat. No.4,736,866); express simian SV40 T-antigen (U.S. Pat. No. 5,728,915);lack the expression of interferon regulatory factor 1 (IRF-1) (U.S. Pat.No. 5,731,490); exhibit dopaminergic dysfunction (U.S. Pat. No.5,723,719); express at least one human gene which participates in bloodpressure control (U.S. Pat. No. 5,731,489); display greater similarityto the conditions existing in naturally occurring Alzheimer's disease(U.S. Pat. No. 5,720,936); have a reduced capacity to mediate cellularadhesion (U.S. Pat. No. 5,602,307); possess a bovine growth hormone gene(Clutter et al. (1996), Genetics 143: 1753-1760); or, are capable ofgenerating a fully human antibody response (McCarthy (1997), Lancet 349:405).

While mice and rats remain the animals of choice for most transgenicexperimentation, in some instances it is preferable or even necessary touse alternative animal species. Transgenic procedures have beensuccessfully utilized in a variety of non-murine animals, includingsheep, goats, pigs, dogs, cats, monkeys, chimpanzees, hamsters, rabbits,cows and guinea pigs (see, e.g., Kim et al. (1997), Mol. Reprod. Dev.46: 515-526; Houdebine (1995), Reprod. Nutr. Dev. 35: 609-617; Petters(1994), Reprod. Fertil. Dev. 6: 643-645; Schnieke et al. (1997), Science278: 2130-2133; and Amoah (1997), J Animal Sci. 75: 578-585).

The method of introduction of nucleic acid fragments into recombinationcompetent mammalian cells can be by any method which favorsco-transformation of multiple nucleic acid molecules. Detailedprocedures for producing transgenic animals are readily available to oneskilled in the art, including the disclosures in U.S. Pat. No. 5,489,743and U.S. Pat. No. 5,602,307.

Q. Diagnostic Methods

As the genes and proteins of the invention are differentially expressedin cancerous tissues compared to non-cancerous tissues, the genes andproteins of the invention may be used to diagnose or monitor cancer, totrack disease progression, or to differentiate cancerous tissue fromnon-cancerous tissue samples. One means of diagnosing cancer using thenucleic acid molecules or proteins of the invention involves obtainingtissue from living subjects.

Assays to detect nucleic acid or protein molecules of the invention maybe in any available format. Typical assays for nucleic acid moleculesinclude hybridization or PCR based formats. Typical assays for thedetection of proteins, polypeptides or peptides of the invention includethe use of antibody probes in any available format such as in situbinding assays, etc. (see Harlow & Lane, Antibodies—A Laboratory Manual,Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1988). Inpreferred embodiments, assays are carried-out with appropriate controls.

Generally, the diagnostics of the invention can be classified accordingto whether the embodiment is a nucleic acid or protein-based assay. Somediagnostic assays detect mutations or polymorphisms in the inventionnucleic acids or proteins, which contribute to cancerous aberrations.Other diagnostic assays identify and distinguish defects in proteinactivity by detecting a level of invention RNA or protein in a testedorganism that resembles the level of invention RNA or protein in aorganism suffering from a disease, such as cancer, or by detecting alevel of RNA or protein in a tested organism that is different than anorganism not suffering from a disease.

Additionally, the manufacture of kits that incorporate the reagents andmethods described in the following embodiments so as to allow for therapid detection and identification of aberrations in protein activity orlevel are contemplated. The diagnostic kits can include a nucleic acidprobe or an antibody or combinations thereof, which specifically detecta mutant form of the invention protein or a nucleic acid probe or anantibody or combinations thereof, which can be used to determine thelevel of RNA or protein expression of one or more invention protein. Thedetection component of these kits will typically be supplied incombination with one or more of the following reagents. A supportcapable of absorbing or otherwise binding DNA, RNA, or protein willoften be supplied. Available supports include membranes ofnitrocellulose, nylon or derivatized nylon that can be characterized bybearing an array of positively charged substituents. One or morerestriction enzymes, control reagents, buffers, amplification enzymes,and non-human polynucleotides like calf-thymus or salmon-sperm DNA canbe supplied in these kits.

Useful nucleic acid-based diagnostic techniques include, but are notlimited to, direct DNA sequencing, gradient gel electrophoresis,Southern Blot analysis, single-stranded confirmation analysis (SSCA),RNAse protection assay, dot blot analysis, nucleic acid amplification,allele-specific PCR and combinations of these approaches. The startingpoint for these analyses is isolated or purified nucleic acid from abiological sample. It is contemplated that tissue biopsies would providea good sample source. The nucleic acid is extracted from the sample andcan be amplified by a DNA amplification technique such as the PolymeraseChain Reaction (PCR) using primers. Those of skill in the art willreadily recognize methods available for confirming the presence ofpolymorphisms. In addition, any addressable array technology known inthe art can be employed with this aspect of the invention. Oneparticular embodiment of polynucleotide arrays is known as Genechips™,and has been generally described in U.S. Pat. No. 5,143,854; PCTpublications WO 90/15070 and 92/10092.

A wide variety of labels and conjugation techniques are known by thoseskilled in the art and can be used in various nucleic acid assays. Thereare several ways to produce labeled nucleic acids for hybridization orPCR including, but not limited to, oligolabeling, nick translation,end-labeling, or PCR amplification using a labeled nucleotide.Alternatively, a nucleic acid encoding an invention protein can becloned into a vector for the production of an mRNA probe. Such vectorsare known in the art, are commercially available, and can be used tosynthesize RNA probes in vitro by addition of an appropriate RNApolymerase such as T7, T3 or SP6 and labeled nucleotides. A number ofcompanies such as Pharmacia Biotech (Piscataway, N.J.), Promega(Madison, Wis.), and U.S. Biochemical Corp (Cleveland, Ohio) supplycommercial kits and protocols for these procedures. Suitable reportermolecules or labels include those radionuclides, enzymes, fluorescent,chemiluminescent, or chromogenic agents, as well as, substrates,cofactors, inhibitors, magnetic particles and the like.

In preferred protein-based diagnostic, antibodies of the invention areattached to a support in an ordered array wherein a plurality ofantibodies are attached to distinct regions of the support that do notoverlap with each other. Those of skill in the art will readilyrecognize available assays that are protein-based diagnostics. Proteinsare obtained from biological samples and are labeled by conventionalapproaches (e.g., radioactivity, colorimetrically, or fluorescently).Employing labeled standards of a known concentration of mutant and/orwild-type invention protein, an investigator can accurately determinethe concentration of the invention protein in a sample and from thisinformation can assess the expression level of the particular form ofthe protein. Conventional methods in densitometry can also be used tomore accurately determine the concentration or expression level of suchprotein. These approaches are also easily automated using technologyknown to those of skill in the art of high throughput diagnosticanalysis. As detailed above, any addressable array technology known inthe art can be employed with this aspect of the invention and displaythe protein arrays on the chips in an attempt to maximize antibodybinding patterns and diagnostic information.

As discussed above, the presence or detection of a polymorphism in aninvention gene or protein can provide a diagnosis of a cancer or similarmalady in an organism. Additional embodiments include the preparation ofdiagnostic kits comprising detection components, such as antibodies,specific for a particular polymorphic variant of invention gene orprotein. The detection component will typically be supplied incombination with one or more of the following reagents. A supportcapable of absorbing or otherwise binding RNA or protein will often besupplied. Available supports for this purpose include, but are notlimited to, membranes of nitrocellulose, nylon or derivatized nylon thatcan be characterized by bearing an array of positively chargedsubstituents, and Genechips™ or their equivalents. One or more enzymes,such as Reverse Transcriptase and/or Taq polymerase, can be furnished inthe kit, as can dNTPs, buffers, or non-human polynucleotides likecalf-thymus or salmon-sperm DNA. Results from the kit assays can beinterpreted by a healthcare provider or a diagnostic laboratory.Alternatively, diagnostic kits are manufactured and sold to privateindividuals for self-diagnosis.

In addition to diagnosing disease according to the presence or absenceof a polymorphism, some diseases involving cancer result from skewedlevels of invention protein or gene in particular tissues or aberrantpatterns of invention protein expression. By monitoring the level ofexpression in various tissues, for example, a diagnosis can be made or adisease state can be identified. Similarly, by determining ratios of thelevel of expression of various invention proteins in specific tissues(e.g., patterns of expression) a prognosis of health or disease can bemade. The levels of invention protein expression in various tissues fromhealthy individuals, as well as, individuals suffering from cancers isdetermined. These values can be recorded in a database and can becompared to values obtained from tested individuals. Additionally, theratios or patterns of expression in various tissues from both healthyand diseased individuals is recorded in a database. These analyses arereferred to as “disease state profiles” and by comparing one diseasestate profile (e.g. from a healthy or diseased individual) to a diseasestate profile from a tested individual, a clinician can rapidly diagnosethe presence or absence of disease.

The nucleic acid and protein-based diagnostic techniques described abovecan be used to detect the level or amount or ratio of expression ofinvention genes or proteins in a tissue. Through quantitative Northernhybridizations, in situ analysis, immunohistochemistry, ELISA, genechiparray technology, PCR, and Western blots, for example, the amount orlevel of expression of RNA or protein for a particular invention protein(wild-type or mutant) can be rapidly determined and from thisinformation ratios of expression can be ascertained. Alternatively, theinvention proteins to be analyzed can be family members that arecurrently unknown but which are identified based on their possession ofone or more of the homology regions described above.

Without further description, it is believed that one of ordinary skillin the art can, using the preceding description and the followingillustrative examples, make and utilize the compounds of the presentinvention and practice the claimed methods. The following workingexamples therefore, specifically point out preferred embodiments of thepresent invention, and are not to be construed as limiting in any waythe remainder of the disclosure.

EXAMPLES Example 1 Identification of Differentially Expressed mRNA inCancers-1

Global changes in gene expression between tumor biopsies and normaltissues have been examined using the GeneExpress Oncology Datasuite™ ofGene Logic, Inc. (Gaithersburg, Md.). The database includes the geneexpression profiles, generated by using the Affymetrix Human Genome U95array, derived from normal and cancer tissue samples from many differentorgans. Among the tissue samples in the database, applicants analyzedthe expression profiles of normal and cancer tissue sets from breast,colon, esophagus, kidney, liver, lung, lymph node, ovary, pancreas,prostate, rectum, and stomach.

The Affymetrix Human Genome U95 array contains 63,175 probe sets. Aprobe set is a set of probes to detect one transcript (a gene or a cDNAclone), and usually consists of 16-20 oligonucleotide probe pairs. Theseprobe pairs include perfectly matched sets and mismatched sets, both ofwhich are necessary for the calculation of average difference. Averagedifference serves as a relative indicator of the level of expression ofa transcript and is a measure of the intensity difference for each probepair, calculated by subtracting the intensity of the mismatch from theintensity of the perfect match. This takes into considerationvariability in hybridization among probe pairs and other hybridizationartifacts that could affect the fluorescence intensities. Using theaverage difference value that has been calculated, an absolute call foreach gene is made; “Absent” (=not detected), “Present” (=detected) or“Marginal” (=not clearly Absent or Present).

Differential expression of genes between cancerous and normal tissuesamples was determined with the following statistical methods. (1) Foreach probe set, average difference values and absolute calls weredetermined by Affymetrix Microarray Suite (v4.0). (2) In a given sampleset, outliers among the tissue samples were detected by PrincipalComponent Analysis (PCA) using MatLab program (The MathWorks, Inc.,Natick, Mass.). The data points used in the PCA were the averagedifferences of randomly selected probe sets (5,000-6,000 probe sets).Outliers were excluded from further analysis. (3) Variations of geneexpression were analyzed by using the Fold Change Analysis tool ofGeneExpress program. The fold change (cancerous/normal) was calculatedby comparing the mean average difference for each gene in a canceroussample set against the mean average difference of that gene in thenormal tissue sample set. Genes showing at least 3-fold increases ordecreases in expression level were obtained. Genes were included in theanalysis if they had a p-value of less than or equal to 0.05 asdetermined by an Analysis of Variance Test (Steel et al., Principles andProcedures of Statistics: A Biometrical Approach, Third Ed.,McGraw-Hill, 1997). (4) Genes showing differential expression in atleast 5 different cancer types were selected.

Analysis of the chip data showed that the expression of the marker LFG1was significantly up-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG1 (SEQ ID NO: 1or 3) can be measured by chip sequence fragmnent no. 91875_s_at onAffymetrix GeneChips™ U95. The 91875_s_at sequence is derived from theEST A1053741. The expression levels of 91875_s_at in various malignantneoplasms, compared to normal control tissues, are shown in Table 1,where the fold-change, the direction of the change (up- ordown-regulation), p-value are also indicated. The fold change(cancerous/normal) was calculated by comparing the geometric mean ofaverage difference in a cancerous sample set against the geometric meanof average difference in the normal tissue sample set. A fold changegreater than 1.5 was considered to be significant (Wodicka et al.(1997), Nature Biotech. 15:1359-1367). Also indicated in the Table 1are, for each tissue type, the numbers of samples that are calledpresent, absent, or marginal together with the total number of samplesin that sample set. These data indicate that up-regulation of LFG1 maybe diagnostic for cancer. TABLE 1 Geometric Number of Samples FoldTissue Pathology/Morphology Mean Total Present Marginal Absent ChangeDirection p-value BREAST NORMAL TISSUE, NOS 22.71 34 8 4 22 INFILTRATINGDUCT CARCINOMA 184.04 61 61 0 0 8.11 up 0 INFILTRATING LOBULAR CARCINOMA104.36 10 9 0 1 4.60 up 0.00456 COLON NORMAL TISSUE, NOS 76.46 24 23 0 1ADENOCARCINOMA, NOS 244.76 36 35 0 1 3.20 up 0.00001 ESOPHAGUS NORMALTISSUE, NOS 50.47 18 16 1 1 ADENOCARCINOMA, NOS 297.56 8 8 0 0 5.90 up0.00367 KIDNEY NORMAL TISSUE, NOS 20.00 25 1 0 24 CLEAR CELL CARCINOMA60.48 11 10 1 0 3.02 up 0.00082 RENAL CELL CARCINMA 65.01 16 13 0 3 3.25up 0.00011 LIVER NORMAL TISSUE, NOS 22.06 19 3 0 16 HEPATOCELLULARCARCINOMA, NOS 86.74 23 21 0 2 3.93 up 0 LUNG NORMAL TISSUE, NOS 21.2732 6 0 26 ADENOCARCINOMA, NOS 122.81 39 38 0 1 5.77 up 0 OVARY NORMALTISSUE, NOS 20.21 23 0 0 23 PAPILLARYSEROUSADENOCARCINOMA 112.80 23 21 02 5.58 up 0 PANCREAS NORMAL TISSUE, NOS 20.02 20 1 0 19 ADENOCARCINOMA,NOS 72.55 25 22 0 3 3.62 up 0 RECTUM NORMAL TISSUE, NOS 78.86 20 20 0 0ADENOCARCINOMA, NOS 259.95 22 22 0 0 3.30 up 0.00008 STOMACH NORMALTISSUE, NOS 36.06 18 7 0 11 ADENOCARCINOMA, NOS 218.74 38 36 0 2 6.07 up0

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 91875-s at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-GCTGAAGCAGGAAAATCGCTT-3′ (SEQ ID NO: 17) and5′-TGAGACGGAGTCTCACTCGGT-3′ (SEQ ID NO: 18)) designed based on thesequence information file of the specific Affymetrix fragment(91875_s_at) were used in the assay. The target gene in each RNA sample(10 ng of total RNA) was assayed relative to a reference gene. For thispurpose, primers (5′-GTTTTTCCTAATTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from colon, kidney, liver, lung, ovary, stomach and pancreas(Ambion, Inc., Austin, Tex.). The Q-RT-PCR data confirms theup-regulation of LFG1 in cancer compared to normal samples.

Example 2 Cloning of Full-Length Human cDNA (LFG1) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 1 or 3 was obtained by polymerasechain reaction (PCR) and rapid amplification of cDNA ends (RACE) usingcDNA library from human heart (ResGen, Huntsville, Ala.). Gene-specificoligos for PCR (5′-CACCCTTTGCCTCTGTCACTTCCGCA-3′ (SEQ ID NO: 21),5′-GCTGGAGCACCAGGACTGCATTG-3′ (SEQ ID NO: 22),5′-GGAGCTGAGCAGCAGTGTAATGAA-3′ (SEQ ID NO: 23),5′-GAGGCCTGCCTGAAGGAGGAGCTTC-3′ (SEQ ID NO: 24),5′-TCTGGAAGTAGTGCAGACGCCTCAGG-3′ (SEQ ID NO: 25),5′-AGCCAACGTCGGCTTTGTTATCCAGC-3′ (SEQ ID NO: 26),5′-GCTGTCAGATATGATGGTTCTGGAC-3′ (SEQ ID NO: 27),5′-CCAGCCTCACCACTGTTGGGTTGC-3′ (SEQ ID NO: 28),5′-CATTCTCTGAGCTGTATTAGTGT-3′ (SEQ ID NO: 29),5′-CCTGAGCTGGAATGACCTGCA-3′ (SEQ ID NO: 30),5′-CTTTGTGTTGGCTGCAGCCACA-3′ (SEQ ID NO: 31),5′-TGAGGAGAGACTTTGCTGACTGGT-3′ (SEQ ID NO: 32),5′-GTCCTGTCTGGCGGTGCCGA-3′ (SEQ ID NO: 33),5′-GCTCCAGGATCCCCTGTCACCTGGGCCTTCTGCCT=GGCT-3′ (SEQ ID NO: 34),5′-CCATATGGAGAGGAGAGCAGCGGGCCCA-3′ (SEQ ID NO: 35),5′-GAAGGAGGAACATGGAGAGGAGA-3′ (SEQ ID NO: 36),5′-CCATATGCCCCGGGTAGTCTACTGCAT-3′ (SEQ ID NO: 37), and5′-GTCGACTCGAGTCACTTCCGCAAAAACTTCTTG-3′ (SEQ ID NO: 38)) and RACE(5′-TCCATTCCGAAGGCTCTCCTCC-3′ (SEQ ID NO: 39),51-GTCTGTGTGACGGAAATGTAAGC-3′ (SEQ ID NO: 40), and5′-GAAGGTCGAAGGCAGACCGATGT-3′ (SEQ ID NO: 41)) were designed based onpredicted genes containing the 91875_s_at sequence using Human GenomeBrowser (University of California, Santa Cruz). The amplified productswith the primers were incorporated into PCR4-Topo vector using TopoCloning System (Invitrogen, Carlsbad, Calif.), and followed bysequencing.

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNOS: 1 and 3. In the former, the cDNA comprises 5293 base pairs. In thelatter, the cDNA comprises 5317 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:1, at nucleotides 390-4880 (390-4883 including the stop codon), encodesa protein of 1497 amino acids. The amino acid sequence corresponding toa predicted protein encoded by SEQ ID NO: 1 is set forth in SEQ ID NO:2. FIG. 2 shows the results of a hydrophobicity analysis of the aminoacid sequence of SEQ ID NO: 2 using Kyte-Doolittle values byte andDoolittle (1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may beused to produce antigenic peptides, as described above.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:3, at nucleotides 124904 (12-4907 including the stop codon), encodes aprotein of 1631 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 3 is set forth in SEQ ID NO: 4.FIG. 3 shows the results of a hydrophobicity analysis of the amino acidsequence of SEQ ID NO: 4 using Kyte-Doolittle values (Kyte and Doolittle(1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may be used toproduce antigenic peptides, as described above.

The protein sequence of SEQ ID NO: 2 is identical to that of SEQ ID NO:4, except that SEQ ID NO: 2 lacks the first 134 amino acids at theN-terminus of SEQ ID NO: 4.

SEQ ID NOS: 2 and 4 contain Calponin homology domain (amino acidpositions 38-145 of SEQ ID NO: 4), IQ domain for calmodulin-binding(amino acid positions 629-646 of SEQ ID NO: 2 and amino acid positions763-780 of SEQ ID NO: 4), RasGAP domain (amino acid positions 858-1195of SEQ ID NO: 2 and amino acid positions 992-1329 of SEQ ID NO: 4), andRasGAP C-terminal domain (amino acid positions 1298-1421 of SEQ ID NO: 2and amino acid positions 1432-1555 of SEQ ID NO: 4). SEQ ID NOS: 2 and 4are similar to IQGAP proteins (Weissbach et al. (1994), J Biol Chem269:20517-20521; Brill et al. (1996), Mol Cell Biol 16:4869-4878). IQGAPbinds to and modulate the function of proteins involved in cytoskeletalstructure, cell-cell adhesion, and proliferation signaling (Fukada etal. (2002), Cell 109: 1-20; Briggs et al. (2002), J Biol Chem 277:7453-7465; McCallum et al. (1998), J Biol Chem 273: 22537-22544).IQGAP1-deficient mice exhibited a significant increase in late-onsetgastric hyperplasia relative to wild-type (Li et al. (2000), Mol CellBiol 20: 697-701).

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG1. A Northern blot containingtotal RNAs from various human tissues was used (Human 12-Lane MTN Blot,Clontech, Palo Alto, Calif.), and an EST containing 91875_s_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed three transcripts for this gene, which are approximately 7.2 kb,and 6.3 kb in size. This corresponds to the sizes of the LFG1 clones(SEQ ID NO: 1 and 3).

Example 3 Identification of Differentially Expressed mRNA in Cancers-2

The process in EXAMPLE 1 was repeated except that the marker LFG2 wasused instead of the marker LFG1.

Analysis of the chip data showed that the expression of the marker LFG2was significantly down-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG2 (SEQ ID NO: 5)can be measured by chip sequence fragment no. 82941_at on AffymetrixGeneChips® U95. The 82941_at sequence is derived from the EST AI277612.The expression levels of 82941_at in various malignant neoplasms,compared to normal control tissues, are shown in Table 2, where thefold-change, the direction of the change (up- or down-regulation),p-value are also indicated. The fold change (cancerous/normal) wascalculated by comparing the geometric mean of average difference in acancerous sample set against the geometric mean of average difference inthe normal tissue sample set. A fold-change greater than 1.5 wasconsidered to be significant (Wodicka et al. (1997), Nature Biotech.15:1359-1367). Also indicated in the Table 2 are, for each tissue type,the numbers of samples that are called present, absent, or marginaltogether with the total number of samples in that sample set. These dataindicate that down-regulation of LFG2 may be diagnostic for cancer.TABLE 2 Geometric Number of Samples Fold Tissue Pathology/MorphologyMean Total Present Marginal Absent Change Direction p-value BREASTNORMAL TISSUE, NOS 1147.66 34 34 0 0 INFILTRATING DUCT CARCINOMA 129.7761 26 3 32 8.71 down 0 INFILTRATING LOBULAR 183.37 10 6 1 3 5.48 down0.00002 CARCINOMA COLON NORMAL TISSUE, NOS 890.06 24 23 1 0ADENOCARCINOMA, NOS 163.35 36 17 1 18 5.39 down 0 ESOPHAGUS NORMALTISSUE, NOS 612.34 18 18 0 0 ADENOCARCINOMA, NOS 265.11 8 7 1 0 2.31down 0.02218 LIVER NORMAL TISSUE, NOS 182.73 19 11 1 7 HEPATOCELLULARCARCINOMA, 114.69 23 7 1 15 1.51 down 0.01211 NOS LUNG NORMAL TISSUE,NOS 535.64 32 30 2 0 ADENOCARCINOMA, NOS 119.36 39 17 3 19 4.27 down 0LYMPH NODE NORMAL TISSUE, NOS 454.08 9 7 0 2 MALIGNANT LYMPHOMA, NOS123.13 12 5 0 7 3.24 down 0.02245 OVARY NORMAL TISSUE, NOS 279.99 23 210 2 PAPILLARY SEROUS 85.45 23 7 1 15 3.5 down 0 ADENOCARCINOMA PROSTATENORMAL TISSUE, NOS 195.77 19 13 1 5 ADENOCARCINOMA, NOS 80.06 19 2 2 152.57 down 0.00011 RECTUM NORMAL TISSUE, NOS 943.86 20 19 0 1ADENOCARCINOMA, NOS 176.45 22 13 2 7 5.2 down 0 STOMACH NORMAL TISSUE,NOS 414.40 18 16 0 2 ADENOCARCINOMA, NOS 125.39 38 17 2 19 3.21 down 0

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 82941_at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-GAATGTGTCAGAGACAAGTGCAGC-3′ (SEQ ID NO: 42) and5′-TGTAGAAACTCTTGGACTAATGGAGG-3′ (SEQ ID NO: 43)) designed based on thesequence information file of the EST containing the Affymetrix fragment(82941 at) were used in the assay. The target gene in each RNA sample(10 ng of total RNA) was assayed relative to a reference gene. For thispurpose, primers (5′-GTTTTTCCTAATTTTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from colon, liver, lung, ovary, and stomach (Ambion, Inc.,Austin, Tex.). The Q-RT-PCR data confirms the down-regulation of LFG2 incancer compared to normal samples.

Example 4 Cloning of Full-Length Human cDNA (LFG2) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 5 was obtained by theoligo-pulling method using the GeneTrapper assay (Life Technologies,Rockville, Md.). Briefly, a gene-specific oligo(5′-GAATGTGTCAGAGACAAGTGCAGC-3′ (SEQ ID NO: 42)) was designed based onthe sequence of the EST containing 82941_at sequence. The oligo waslabeled with biotin and used to hybridize with 5 μg of single strandplasmid DNA (cDNA recombinants) from a poorly differentiated stomachadenocarcinoma library (NCI CGAP Gas4) (ResGen, Huntsville, Ala.)following the procedures of Sambrook et al. The hybridized cDNAs wereseparated by streptavidin-conjugated beads and eluted by heating. Theeluted cDNA was converted to double strand plasmid DNA and used totransform E. coli cells (DH10B) and the longest cDNA was screened. Afterpositive selection was confirmed by PCR using gene-specific primers, thecDNA clone was subjected to DNA sequencing.

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNO: 5. The cDNA comprises 3608 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:5, at nucleotides 424-1908 (424-1911 including the stop codon), encodesa protein of 495 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 5 is set forth in SEQ ID NO: 6.

SEQ ID NO: 6 has homology to scavenger receptors, which are involved inendocytosis of selected polyanionic ligands, phagocytosis of apoptoticcells and bacteria, cell adhesion, and development of atherosclerosis(Peiser et al. (2002), Curr. Opin. Immunol. 14:123-128; Resnick et al.(1994), Trends Biol. Sci. 19:5-8). Based on published studies ofscavenger receptors, SEQ ID NO: 6 contains a cytoplasmic domain (aminoacid positions 1-35), a transmembrane domain (amino acid positions36-58), an α-helical coiled-coil domain (amino acid positions 90-301), acollagen-like domain (amino acid positions 305-380), and a scavengerreceptor cystein-rich (SRCR) domain (amino acid positions 393-493). TheSRCR domain contains six cysteine residues (amino acid positions 418,431, 462, 472, 482, and 492), which may participate in intradomaindisulfide bonds. SEQ ID NO: 6 also exhibits homology to a mousehomologue (GenBank Accession No. BC016096). It shows 70% identity overthe entire contiguous sequence.

FIG. 4 shows the results of a hydrophobicity analysis of the amino acidsequence of SEQ ID NO: 6 using Kyte-Doolittle values (Kyte and Doolittle(1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may be used toproduce antigenic peptides, as described above.

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG2. A Northern blot containingtotal RNAs from various human tissues was used (Human MTN Blot,Clontech, Palo Alto, Calif.), and the EST containing 82941_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed a single transcript for this gene, which is approximately 3.7 kbin size. This corresponds to the size of the LFG2 clone (SEQ ID NO: 5).

Example 5 Identification of Differentially Expressed mRNA in Cancers-3

The process in EXAMPLE 1 was repeated except that the marker LFG3 wasused instead of the marker LFG1.

Analysis of the chip data showed that the expression of the marker LFG3was significantly down-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG3 (SEQ ID NO: 7)can be measured by chip sequence fragment no. 46104_at on AffymetrixGeneChips® U95. The 46104_at sequence is derived from the EST AA772055.The expression levels of 46104_at in various malignant neoplasms,compared to normal control tissues, are shown in Table 3, where thefold-change, the direction of the change (up- or down-regulation),p-value are also indicated. The fold change (cancerous/normal) wascalculated by comparing the geometric mean of average difference in acancerous sample set against the geometric mean of average difference inthe normal tissue sample set. A fold-change greater than 1.5 wasconsidered to be significant (Wodicka et al. (1997), Nature Biotech.15:1359-1367). Also indicated in the Table 3 are, for each tissue type,the numbers of samples that are called present, absent, or marginaltogether with the total number of samples in that sample set. These dataindicate that down-regulation of LFG3 may be diagnostic for cancer.TABLE 3 Geometric Number of Samples Fold Tissue Pathology/MorphologyMean Total Present Marginal Absent Change Direction p-value BREASTNORMAL TISSUE, NOS 64.52 34 31 0 3 INFILTRATING DUCT 27.24 61 18 1 422.25 down 0 CARCINOMA INFILTRATING LOBULAR 29.52 10 4 0 6 2.21 down0.00004 CARCINOMA COLON NORMAL TISSUE, NOS 315.46 24 24 0 0ADENOCARCINOMA, NOS 102.99 36 31 0 5 3.02 down 0.00016 ESOPHAGUS NORMALTISSUE, NOS 272.48 18 17 0 1 ADENOCARCINOMA, NOS 41.25 8 6 0 2 6.60 down0.00001 KIDNEY NORMAL TISSUE, NOS 2626.88 25 25 0 0 CLEAR CELL 344.66 1111 0 0 7.62 down 0.00003 ADENOCARCINOMA, NOS RENAL CELL CARCINOMA 355.7116 14 0 2 7.38 down 0.00005 OVARY NORMAL TISSUE, NOS 1098.41 23 23 0 0PAPILLARY SEROUS 178.15 23 22 0 1 6.17 down 0 ADENOCARCINOMA PROSTATENORMAL TISSUE, NOS 274.49 19 19 0 0 ADENOCARCINOMA, NOS 117.26 19 18 0 12.34 down 0.00016 RECTUM NORMAL TISSUE, NOS 410.22 20 20 0 0ADENOCARCINOMA, NOS 72.98 22 16 0 6 5.38 down 0 STOMACH NORMAL TISSUE,NOS 71.10 18 10 0 8 ADENOCARCINOMA, NOS 35.49 38 15 1 22 1.96 down0.00459

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 46104_at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-GTATGCATCAGAATTCCCTATAGATCTTT-3′ (SEQ ID NO: 44) and5′-TAGATGTTTGGGCAACAGCCT-3′ (SEQ ID NO: 45)) designed based on thesequence information file of the EST containing the Affymetrix fragment(46104_at) were used in the assay. The target gene in each RNA sample(10 ng of total RNA) was assayed relative to a reference gene. For thispurpose, primers (5′-GTTTTTCCTAATTTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from colon, kidney, ovary, pancreas, and stomach (Ambion, Inc.,Austin, Tex.). The Q-RT-PCR data confirms the down-regulation of LFG3 incancer compared to normal samples.

Example 6 Cloning of Full-Length Human cDNA (LFG3) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 7 was obtained by theoligo-pulling method using the GeneTrapper assay (Life Technologies,Rockville, Md.). Briefly, a gene-specific oligo(5′-GTATGCATCAGAATTCCCTATAGATCTTT-3′ (SEQ ID NO: 44)) was designed basedon the sequence of the EST containing 46104_at sequence. The oligo waslabeled with biotin and used to hybridize with 5 μg of single strandplasmid DNA (cDNA recombinants) from human fetal kidney (ResGen,Huntsville, Ala.) following the procedures of Sambrook et al. Thehybridized cDNAs were separated by streptavidin-conjugated beads andeluted by heating. The eluted cDNA was converted to double strandplasmid DNA and used to transform E. coli cells (DH10B) and the longestcDNA was screened. After positive selection was confirmed by PCR usinggene-specific primers, the cDNA clone was subjected to DNA sequencing.The 5′-end of LFG3 was identified by rapid amplification of cDNA ends(RACE) using the cDNA prepared from human fetal kidney (Clontech, PaloAlto, Calif.) and a gene specific primer(5′-TTCCTTCACCAAAGGCATCCAGCCATTCTATG-3′ (SEQ ID NO: 46)).

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNO: 7. The cDNA comprises 3162 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:7, at nucleotides 405-1835 (405-1838 including the stop codon), encodesa protein of 477 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 7 is set forth in SEQ ID NO: 8.

SEQ ID NO: 8 is similar to monocarboxylate transporters (MCTs) andcontains ten predicted transmembrane domains (amino acids positions10-29, 80-99, 107-128, 140-160, 274-295, 312-332, 339-360, 363-384,396416, and 433-451). MCT proteins catalyze the facilitated transport ofmonocarboxylates such as lactate, pyruvate, branched-chain oxo acids,ketone bodies, beta-hydroxy-butylate, and acetate (Halestrap and Price(1999), Biochem. J. 343:281-299). Table 4 summarizes the similarityratios of SEQ ID NO: 4 with the eight known monocarboxylatetransporters. TABLE 4 Homology of LFG3 with MCT proteins Protein Size(amino acids) Identity (%) Positives (%) MCT1 500 17.5 34.3 MCT2 47819.5 35.5 MCT3 504 19.5 34.1 MCT4 465 19.0 33.2 MCT5 487 22.1 36.9 MCT6505 16.4 31.5 MCT7 523 20.1 35.2 MCT8 613 15.9 27.9

FIG. 5 shows the results of a hydrophobicity analysis of the amino acidsequence of SEQ ID NO: 8 using Kyte-Doolittle values (Kyte and Doolittle(1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may be used toproduce antigenic peptides, as described above.

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG3. A Northern blot containingtotal RNAs from various human tissues was used (Human 12-Lane MTN Blot,Clontech, Palo Alto, Calif.), and the EST containing 46104_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed a single transcript for this gene, which is approximately 4.2 kbin size. This corresponds to the size of the LFG3 clone (SEQ ID NO: 7).

Example 7 Identification of Differentially Expressed mRNA in Cancers-4

The process in EXAMPLE 1 was repeated except that the marker LFG4 wasused instead of the marker LFG1.

Analysis of the chip data showed that the expression of the marker LFG4was significantly down-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG4 (SEQ ID NO: 9)can be measured by chip sequence fragment no. 62158_at on AffymetrixGeneChips® U95. The 622158_at sequence is derived from the EST A1123532.The expression levels of 62158_at in various malignant neoplasms,compared to normal control tissues, are shown in Table 5, where thefold-change, the direction of the change (up- or down-regulation),p-value are also indicated. The fold change (cancerous/normal) wascalculated by comparing the geometric mean of average difference in acancerous sample set against the geometric mean of average difference inthe normal tissue sample set. A fold-change greater than 1.5 wasconsidered to be significant (Wodicka et al. (1997), Nature Biotech.15:1359-1367). Also indicated in the Table 5 are, for each tissue type,the numbers of samples that are called present, absent, or marginaltogether with the total number of samples in that sample set. These dataindicate that down-regulation of LFG4 may be diagnostic for cancer.TABLE 5 Geometric Number of Samples Fold Tissue Pathology/MorphologyMean Total Present Marginal Absent Change Direction p-value BREASTNORMAL TISSUE, NOS 156.75 34 33 0 1 INFILTRATING DUCT 90.09 61 51 0 101.74 down 0.00001 CARCINOMA COLON NORMAL TISSUE, NOS 234.06 24 22 2 0ADENOCARCINOMA, NOS 64.02 36 24 0 12 3.66 down 0 KIDNEY NORMAL TISSUE,NOS 134.17 25 23 0 2 CLEAR CELL 78.59 11 7 1 3 1.71 down 0.08272ADENOCARCINOMA, NOS RENAL CELL CARCINOMA 55.31 16 9 0 7 2.43 down 0.0021LUNG NORMAL TISSUE, NOS 179.71 32 32 0 0 ADENOCARCINOMA, NOS 47.39 39 173 19 3.79 down 0 LYMPH NORMAL TISSUE, NOS 140.51 9 7 1 1 NODE MALIGNANTLYMPHOMA, 41.43 12 5 1 6 3.39 down 0.00207 NOS OVARY NORMAL TISSUE, NOS125.19 23 21 0 2 PAPILLARY SEROUS 37.23 23 4 0 19 3.36 down 0ADENOCARCINOMA PROSTATE NORMAL TISSUE, NOS 191.94 19 18 0 1ADENOCARCINOMA, NOS 103.47 19 16 0 3 1.86 down 0.00185 RECTUM NORMALTISSUE, NOS 317.95 20 20 0 0 ADENOCARCINOMA, NOS 74.28 22 16 1 5 4.28down 0 STOMACH NORMAL TISSUE, NOS 161.77 18 17 0 1 ADENOCARCINOMA, NOS84.55 38 27 2 9 1.91 down 0.0062

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 62158_at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-AAATGTCTGATTACCCCATTTTATCAGT-3′ (SEQ ID NO: 47) and5′-TAATCCTGAAATGAACAGCTAACA-3′) (SEQ ID NO: 48) designed based on thesequence information file of the EST containing the Affymetrix fragment(62158_at) were used in the assay. The target gene in each RNA sample(10 ng of total RNA) was assayed relative to a reference gene. For thispurpose, primers (5′-GTTTTTCCTAATFTTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from colon, liver, lung, ovary, pancreas, and stomach (Ambion,Inc., Austin, Tex.). The Q-RT-PCR data confirms the down-regulation ofLFG4 in cancer compared to normal samples.

Example 8 Cloning of Full-Length Human cDNA (LFG4) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 9 was obtained by rapidamplification of cDNA ends (RACE). Briefly, gene-specific oligos(5′-TAATGTTAGAGTAACAGCATTTTCCTTCAA-3′ (SEQ ID NO: 49) and5′-TGCCCCACACTAACTCAGTTCTTGTGATG-3′ (SEQ ID NO: 50)) were designed basedon the sequence of the EST containing 62158_at sequence. The oligos wasused for PCR amplification of the cDNAs prepared from human brain(Clontech, Palo Alto, Calif.). The amplified products with the primerswere incorporated into PCR4-Topo vector using Topo Cloning System(Invitrogen, Carlsbad, Calif.), and followed by sequencing.

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNO: 9. The cDNA comprises 4891 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:9, at nucleotides 89-1150 (89-1153 including the stop codon), encodes aprotein of 354 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 9 is set forth in SEQ ID NO: 10.

SEQ ID NO: 10 is similar to rat Kilon and chicken Neurotractin (Funatsuet al. (1999), J Biol Chem 274:8224-8230; Marg et al. (1999), J CellBiol 145:865-876). Protein sequence analysis reveals a secretory signalpeptide (amino acid positions 1-33), three immunoglobulin domains (aminoacid positions 47-136, 145-208, and 231-312), and six putative N-linkedglycosylation sites (amino acid positions 73, 155, 275, 286, 294, and307).

Kilon/Neurotractin is a member of IgLON subfamily of the immunoglobulinsuperfamily.

IgLONs are a family of glycosylphosphatidylinositol (GPI)-linked celladhesion molecules which are thought to modify neurite outgrowth andmight play a role in cell-cell adhesion and recognition (Miyate et al.(2000), J Comparative Neurol 424:74-85).

FIG. 6 shows the results of a hydrophobicity analysis of the amino acidsequence of SEQ ID NO: 10 using Kyte-Doolittle values (Kyte andDoolittle (1982), J Mol. Biol. 157:105-142). Hydrophilic regions may beused to produce antigenic peptides, as described above. This hydropathyplot shows the presence of hydrophobic region at the C-terminus. In caseof GPI-anchored proteins, the addition of the GPI anchor is known tooccur after the cleavage of the C-terminal hydrophobic region. Aputative GPI anchor attachment site was found (Gly at the amino acidposition 324).

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG4. A Northern blot containingtotal RNAs from various human tissues was used (Human 12-Lane MTN Blot,Clontech, Palo Alto, Calif.), and the EST containing 62158_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed a single transcript for this gene, which is approximately 5.4 kbin size. This corresponds to the size of the LFG4 clone (SEQ ID NO: 9).

Example 9 Identification of Differentially Expressed mRNA in Cancers-5

The process in EXAMPLE 1 was repeated except that the marker LFG5 wasused instead of the marker LFG1.

Analysis of the chip data showed that the expression of the marker LFG5was significantly down-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG5 (SEQ ID NO: 11)can be measured by chip sequence fragment no. 46659_at on AffymetrixGeneChips® U95. The expression levels of 46659_at in various malignantneoplasms, compared to normal control tissues, are shown in Table 6,where the fold-change, the direction of the change (up- ordown-regulation), p-value are also indicated. The fold change(cancerous/normal) was calculated by comparing the geometric mean ofaverage difference in a cancerous sample set against the geometric meanof average difference in the normal tissue sample set. Also indicated inthe Table 6 are, for each tissue type, the numbers of samples that arecalled present, absent, or marginal together with the total number ofsamples in that sample set. These data indicate that differentialregulation of LFG5 may be diagnostic for cancer. TABLE 6 GeometricNumber of Samples Fold Tissue Pathology/Morphology Mean Total PresentMarginal Absent Change Direction p-value BREAST NORMAL TISSUE, NOS152.75 34 31 0 3 INFILTRATING DUCT 404.58 61 60 0 1 2.65 up 0 CARCINOMAINFILTRATING LOBULAR 277.71 10 10 0 0 1.82 up 0.07445 CARCINOMAESOPHAGUS NORMAL TISSUE, NOS 85.47 18 15 0 2 ADENOCARCINOMA, NOS 373.978 8 0 0 4.38 up 0.0009 KIDNEY NORMAL TISSUE, NOS 53.58 25 17 0 8 CLEARCELL CARCINOMA 161.36 11 11 0 0 3.01 up 0.00011 RENAL CELL CARCINMA249.37 16 16 0 0 4.65 up 0 LUNG NORMAL TISSUE, NOS 330.65 32 31 0 1ADENOCARCINOMA, NOS 195.43 39 35 0 4 1.69 down 0.00538 LYMPH NODE NORMALTISSUE, NOS 219.77 9 9 0 0 MALIGNANT LYMPHOMA, NOS 142.09 12 11 0 1 1.55down 0.25114 OVARY NORMAL TISSUE, NOS 90.40 23 19 0 4 PAPILLARY SEROUS418.81 23 23 0 0 4.63 up 0 ADENOCARCINOMA PANCREAS NORMAL TISSUE, NOS38.53 20 12 0 8 ADENOCARCINOMA, NOS 344.37 25 25 0 0 8.94 up 0 STOMACHNORMAL TISSUE, NOS 185.50 18 17 0 1 ADENOCARCINOMA, NOS 279.62 38 35 0 31.51 up 0.12664

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 46659_at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-AAGGCTTTATCAGGTCTGCATATAGAATC-3′ (SEQ ID NO: 51) and5′-GCAAAGAACCCTAATGCTATTTATCAGC-3′ (SEQ ID NO: 52)) designed based onthe sequence information file of the specific Affymetrix fragment (46659at) were used in the assay. The target gene in each RNA sample (10 ng oftotal RNA) was assayed relative to a reference gene. For this purpose,primers (5′-GTTTTTCCTAATTTTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from kidney, lung, ovary, and pancreas (Ambion, Inc., Austin,Tex.). The Q-RT-PCR data confirms the differential regulation of LFG5 incancer compared to normal samples.

Example 10 Cloning of Full-Length Human cDNA (LFG5) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 11 was obtained by theoligo-pulling method using the GeneTrapper assay (Life Technologies,Rockville, Md.). Briefly, a gene-specific oligo(5′-GAGAAGACCAGGGAAGAAGCAG-3′ (SEQ ID NO: 53)) was designed based on thesequence of an EST containing 46659_at sequence. The oligo was labeledwith biotin and used to hybridize with 5 μg of single strand plasmid DNA(cDNA recombinants) from a human heart library (ResGen, Huntsville,Ala.) following the procedures of Sambrook et al. The hybridized cDNAswere separated by streptavidin-conjugated beads and eluted by heating.The eluted cDNA was converted to double strand plasmid DNA and used totransform E. coli cells (DH10B) and the longest cDNA was screened. Afterpositive selection was confirmed by PCR using gene-specific primers, thecDNA clone was subjected to DNA sequencing.

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNO: 11. The cDNA comprises 3098 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:11, at nucleotides 223-1569 (223-1572 including the stop codon), encodesa protein of 449 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 11 is set forth in SEQ ID NO:12.

SEQ ID NO: 12 contains a thymidylate kinase domain (amino acid positions257-438). Thymidylate kinase is a member of nucleotide monophosphatekinases (NMPKs) which play roles in the nucleotide synthesis for RNA andDNA synthesis and are required for the pharmacological activation oftherapeutic nucleoside and nucleotide analogs (Van Rompay et al. (2000),Pharmacology & Therapeutics 87:189-198). SEQ ID NO: 12 exhibits homologyto a mouse thymidylate kinase (GenBank Accession No. NM-020557) which isinduced during macrophage activation (Lee and O'Brien (1995), J.Immunol. 154:6094-6102). It shows 63% identity over the entirecontiguous sequence.

FIG. 7 shows the results of a hydrophobicity analysis of the amino acidsequence of SEQ ID NO: 12 using Kyte-Doolittle values (Kyte andDoolittle (1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may beused to produce antigenic peptides, as described above.

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG5. A Northern blot containingtotal RNAs from various human tissues was used (Human MYN Blot,Clontech, Palo Alto, Calif.), and an EST containing 82941_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed a single transcript for this gene, which is approximately 3.0 kbin size. This corresponds to the size of the LFG5 clone (SEQ ID NO: 11).

Example 11 Identification of Differentially Expressed mRNA in Cancers-6

The process in EXAMPLE 1 was repeated except that the marker LFG6 wasused instead of the marker LFG1.

Analysis of the chip data showed that the expression of the marker LFG6was significantly up-regulated in cancer tissue samples compared tosamples from normal tissue. The expression level of LFG6 (SEQ ID NO: 13or 15) can be measured by chip sequence fragment no. 44103_at onAffymetrix GeneChips® U95. The 44103_at sequence is derived from the ESTAA865614. The expression levels of 44103_at in various malignantneoplasms, compared to normal control tissues, are shown in Table 7,where the fold-change, the direction of the change (up- ordown-regulation), p-value are also indicated. The fold change(cancerous/normal) was calculated by comparing the geometric mean ofaverage difference in a cancerous sample set against the geometric meanof average difference in the normal tissue sample set. A fold changegreater than 1.5 was considered to be significant (Wodicka et al.(1997), Nature Biotech. 15:1359-1367). Also indicated in the Table 7are, for each tissue type, the numbers of samples that are calledpresent, absent, or marginal together with the total number of samplesin that sample set. These data indicate that up-regulation of LFG6 maybe diagnostic for cancer. TABLE 7 Geometric Number of Samples FoldTissue Pathology/Morphology Mean Total Present Marginal Absent ChangeDirection p-value KIDNEY NORMAL TISSUE, NOS 337.71 25 25 0 0 CLEAR CELL556.82 11 11 0 0 1.65 up 0.00314 ADENOCARCINOMA, NOS LIVER NORMALTISSUE, NOS 406.93 19 18 0 1 HEPATOCELLULAR 619.40 23 22 0 1 1.52 up0.00303 CARCINOMA, NOS OVARY NORMAL TISSUE, NOS 380.10 23 23 0 0PAPILLARY SEROUS 578.60 23 23 0 0 1.52 up 0.00013 ADENOCARCINOMAPANCREAS NORMAL TISSUE, NOS 138.75 20 11 1 8 ADENOCARCINOMA, NOS 453.0125 25 0 0 3.26 up 0.00002

The GeneChip expression results, determined by sample binding to chipsequence fragment no. 44103_at, were validated by quantitative RT-PCR(Q-RT-PCR) using the Taqman® assay (Perkin-Elmer). PCR primers(5′-GGACGGGGAACTTGGACGC-3′ (SEQ ID NO: 54) and5′-AAGTGCAGGGCCTCTGGGTG-3′ (SEQ ID NO: 55)) designed based on thesequence information file of the specific Affymetrix fragment (44103_at)were used in the assay. The target gene in each RNA sample (10 ng oftotal RNA) was assayed relative to a reference gene. For this purpose,primers (5′-GTTTTTCCTAATTTTGGCATGAAC-3′ (SEQ ID NO: 19) and5′-CGCCCAAGCTTTTCCTTTT-3′ (SEQ ID NO: 20)) specific to the CTBP1 gene(C-terminal binding protein 1) were used to serve as control primers.This approach provides the relative expression as measured by cyclethreshold (Ct) value of the target mRNA relative to an amount of CTBP1Ct value. The sample panel included total RNA pairs of normal and tumortissues from liver and ovary (Ambion, Inc., Austin, Tex.). The Q-RT-PCRdata confirms the up-regulation of LFG6 in cancer compared to normalsamples.

Example 12 Cloning of Full-Length Human cDNA (LFG6) Corresponding toDifferentially Expressed mRNA Species

The full-length cDNA having SEQ ID NO: 13 or 15 was obtained by theoligo-pulling method using the GeneTrapper assay (Life Technologies,Rockville, Md.). Briefly, a gene-specific oligo(5′-CGCTGGGTCATCGGACGGT-3′ (SEQ ID NO: 56)) was designed based on thesequence of an EST containing 44103_at sequence. The oligo was labeledwith biotin and used to hybridize with 5 μg of single strand plasmid DNA(cDNA recombinants) from a fully differentiated human stomachadenocarcinoma library (ResGen, Huntsville, Ala.) following theprocedures of Sambrook et al. The hybridized cDNAs were separated bystreptavidin-conjugated beads and eluted by heating. The eluted cDNA wasconverted to double strand plasmid DNA and used to transform E. colicells (DH10B) and the longest cDNA was screened. After positiveselection was confirmed by PCR using gene-specific primers, the cDNAclone was subjected to DNA sequencing.

The nucleotide sequence of the full-length human cDNAs corresponding tothe differentially regulated mRNA detected above is set forth in SEQ IDNOS: 13 and 15. In the former, the cDNA comprises 1893 base pairs. Inthe latter, the cDNA comprises 1597 base pairs.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:13, at nucleotides 418-1392 (418-1395 including the stop codon), encodesa protein of 325 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 13 is set forth in SEQ ID NO:14. FIG. 9 shows the results of a hydrophobicity analysis of the aminoacid sequence of SEQ ID NO: 14 using Kyte-Doolittle values (Kyte andDoolittle (1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may beused to produce antigenic peptides, as described above.

An open reading frame within the cDNA nucleotide sequence of SEQ ID NO:15, at nucleotides 271-1431 (271-1434 including the stop codon), encodesa protein of 387 amino acids. The amino acid sequence corresponding to apredicted protein encoded by SEQ ID NO: 15 is set forth in SEQ ID NO:16. FIG. 10 shows the results of a hydrophobicity analysis of the aminoacid sequence of SEQ ID NO: 16 using Kyte-Doolittle values byte andDoolittle (1982), J. Mol. Biol. 157:105-142). Hydrophilic regions may beused to produce antigenic peptides, as described above.

SEQ ID NOS: 14 and 16 contain ubiquitin homologues (UBQ) domain (aminoacid positions 239-300). SEQ ID NOS: 14 and 16 are similar to ratSharpin protein (Lim et al. (2001), Mol Cell Neurosci 17:385-397).Sharpin directly interacts with the ankyrin repeats of Shank proteinwhich functions in the organization of cytoskeletal complexes andintracellular signaling at specialized cell junctions (Sheng and Kim(2000), J Cell Sci 113:1851-1856).

Analysis by Northern blot was performed to determine the size of themRNA transcripts that correspond to LFG6. A Northern blot containingtotal RNAs from various human tissues was used (Human 12-Lane MTN Blot,Clontech, Palo Alto, Calif.), and an EST containing 44103_at sequencewas radioactively labeled by the random primer method and used to probethe blot. The blot was hybridized in 50% formamide, 5×SSPE, 0.1% SDS, 5×Denhart's solution, and 0.2 mg/ml herring sperm DNA at 42° C. and washedwith 0.2×SSC containing 0.1% SDS at room temperature. The Northern blotshowed three transcripts for this gene, which are approximately 2.2 kb,1.5 kb, and 1.2 kb in size. This corresponds to the sizes of the LFG6clones (SEQ ID NO: 13 and 15).

Although the present invention has been described in detail withreference to examples above, it is understood that various modificationscan be made without departing from the spirit of the invention.Accordingly, the invention is limited only by the following claims. Allcited patents, patent applications and publications referred to in thisapplication are herein incorporated by reference in their entirety.

1. An isolated nucleic acid molecule selected from the group consistingof: (a) an isolated nucleic acid molecule comprising SEQ ID NO: 1, 3, 5,7, 9, 11, 13 or 15, (b) an isolated nucleic acid molecule encoding SEQID NO: 2, 4, 6, 8, 10, 12, 14 or 16, (c) an isolated nucleic acidmolecule that encodes a protein that is expressed in cancer and thatexhibits at least about 75% nucleotide sequence identity over the entirecontiguous sequence of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13 or 15, and (d)an isolated nucleic acid molecule comprising the complement of a nucleicacid molecule of (a), (b) or (c).
 2. The isolated nucleic acid moleculeof claim 1, wherein the nucleic acid molecule comprises nucleotides390-4880 of SEQ IUD NO: 1, nucleotides 12-4904 of SEQ ID NO: 3,nucleotides 424-1908 of SEQ ID NO: 5, nucleotides 405-1835 of SEQ ID NO:7, nucleotides 89-1150 of SEQ ID NO: 9, nucleotides 223-1569 of SEQ IDNO: 11, nucleotides 418-1392 of SEQ ID NO: 13, or nucleotides 271-1431of SEQ ID NO:
 15. 3. The isolated nucleic acid molecule of claim 1,wherein the nucleic acid molecule comprises nucleotides 390-4883 of SEQID NO: 1, nucleotides 12-4907 of SEQ ID NO: 3, nucleotides 424-1911 ofSEQ IUD NO: 5, nucleotides 405-1838 of SEQ ID NO: 7, nucleotides 89-1153of SEQ ID NO: 9, nucleotides 223-1572 of SEQ ID NO: 11, nucleotides418-1395 of SEQ ID NO: 13, or nucleotides 271-1434 of SEQ ID NO:
 15. 4.The isolated nucleic acid molecule of claim 1, wherein the nucleic acidmolecule consists of nucleotides 390-4883 of SEQ ID NO: 1, nucleotides12-4907 of SEQ ID NO: 3, nucleotides 424-1908 of SEQ ID NO: 5,nucleotides 405-1835 of SEQ ID NO: 7, nucleotides 89-1153 of SEQ ID NO:9, nucleotides 223-1569 of SEQ ID NO: 11, nucleotides 418-1395 of SEQ IDNO: 13, or nucleotides 271-1434 of SEQ ID NO:
 15. 5. The isolatednucleic acid molecule of claims 1-4, wherein said nucleic acid moleculeis operably linked to one or more expression control elements.
 6. Avector comprising an isolated nucleic acid molecule of claims 1-4.
 7. Ahost cell transformed to contain the nucleic acid molecule of claims1-4.
 8. A host cell comprising a vector of claim
 6. 9. A host cell ofclaim 8, wherein said host cell is selected from the group consisting ofprokaryotic host cells and eukaryotic host cells.
 10. A method forproducing a polypeptide comprising culturing a host cell transformedwith the nucleic acid molecule of claims 1-4 under conditions in whichthe protein encoded by said nucleic acid molecule is expressed.
 11. Themethod of claim 10, wherein said host cell is selected from the groupconsisting of prokaryotic host cells and eukaryotic host cells.
 12. Anisolated polypeptide produced by the method of claim
 10. 13. An isolatedpolypeptide or protein selected from the group consisting of a proteincomprising the amino acid sequence of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14or 16, and a protein having at least about 75% amino acid sequenceidentity with SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or
 16. 14. An isolatedantibody or antigen-binding antibody fragment that binds to apolypeptide of claim
 13. 15. An antibody of claim 14 wherein saidantibody is a monoclonal or a polyclonal antibody.
 16. A method ofidentifying an agent which modulates the expression of a nucleic acidencoding a protein of claim 13, comprising: exposing cells which expressthe nucleic acid to the agent; and determining whether the agentmodulates expression of said nucleic acid, thereby identifying an agentwhich modulates the expression of a nucleic acid encoding the protein.17. A method of identifying an agent which modulates the level of or atleast one activity of a protein of claim 13, comprising: exposing cellswhich express the protein to the agent; determining whether the agentmodulates the level of or at least one activity of said protein, therebyidentifying an agent which modulates the level of or at least oneactivity of the protein.
 18. The method of claim 17, wherein the agentmodulates one activity of the protein.
 19. A method of modulating theexpression of a nucleic acid encoding a protein of claim 13, comprising:administering an effective amount of an agent which modulates theexpression of a nucleic acid encoding the protein.
 20. A method ofmodulating at least one activity of a protein of claim 13, comprising:administering an effective amount of an agent which modulates at leastone activity of the protein.
 21. A method of identifying bindingpartners for a protein of claim 13, comprising: exposing said protein toa potential binding partner; and determining if the potential bindingpartner binds to said protein, thereby identifying binding partners forthe protein.
 22. A method of identifying an agent which modulates theinteraction between a binding partner and a protein of claim 13,comprising: exposing said protein with said partner to the agent; anddetermining whether the agent modulates association of the bindingpartner with said protein, thereby identifying an agent which modulatesassociation of a binding partner with said protein.
 23. A method ofmodulating the interaction between a binding partner and a protein ofclaim 13, comprising: administering an effective amount of an agentwhich modulates association of a binding partner with said protein. 24.A non-human transgenic animal modified to contain a nucleic acidmolecule of claims 1-4.
 25. The transgenic animal of claim 24, whereinthe nucleic acid molecule contains a mutation that prevents expressionof the encoded protein.
 26. A method of treating a disease state in asubject, comprising: inserting into a diseased cell a gene constructcomprising an isolated nucleic acid molecule of claims 1-4 linked to apromoter or enhancer element such that expression of said nucleicmolecule causes suppression of said disease.
 27. The method of claim 26,wherein said inserting into a diseased cell is accomplished in vivo. 28.The method of claim 26, wherein said inserting into a diseased cellfurther comprises use of a viral or plasmid agent and is accomplishedeither in vitro or in vivo.
 29. A method of diagnosing a disease statein a subject, comprising: determining the level of expression of anucleic acid molecule of claim
 1. 30. The method of claim 26, whereinthe disease state is cancer.
 31. The method of claim 26, wherein thedisease state is a malignant neoplasm.
 32. The method of claim 31,wherein the malignant neoplasm occurs in the breast, colon, esophagus,kidney, liver, lung, lymph node, ovary, pancreas, prostate, rectum,and/or stomach.
 33. A composition comprising a diluent and a polypeptideor protein selected from the group consisting of: an isolatedpolypeptide comprising the amino acid sequence of SEQ ID NO: 2, 4, 6, 8,10, 12, 14 or 16; an isolated polypeptide comprising a fragment of atleast 10 amino acids of SEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16; anisolated polypeptide comprising conservative amino acid substitutions ofSEQ ID NO: 2, 4, 6, 8, 10, 12, 14 or 16; an isolated polypeptidecomprising naturally occurring amino acid sequence variants of SEQ IDNO: 2, 4, 6, 8, 10, 12, 14 or 16; and an isolated polypeptide exhibitingat least about 75% amino acid sequence identity with SEQ ID NO: 2, 4, 6,8, 10, 12, 14 or
 16. 34. A method of diagnosing a disease state in asubject, comprising: determining the level of expression of a protein ofclaim
 13. 35. The method of claim 34, wherein the disease state iscancer.
 36. The method of claim 34, wherein the disease state is amalignant neoplasm.